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Preface 


The 13th International Conference on Practice and Theory in Public Key Cryp¬ 
tography (PKC 2010) was held May 26-28,2010, at the Ecole Normale Superieure 
(ENS) in Paris, France. PKC 2010 was sponsored by the International Associ¬ 
ation for Cryptologic Research (IACR), in cooperation with the Ecole Normale 
Superieure (ENS) and the Institut National de Recherche en Informatique et 
en Automatique (INRIA). The General Chairs of the conference were Michel 
Abdalla and Pierre-Alain Fouque. 

The conference received a record number of 145 submissions and each sub¬ 
mission was assigned to at least 3 committee members. Submissions co-authored 
by members of the Program Committee were assigned to at least five commit¬ 
tee members. Due to the large number of high-quality submissions, the review 
process was challenging and we are deeply grateful to the 34 committee mem¬ 
bers and the 163 external reviewers for their outstanding work. After extensive 
discussions, the Program Committee selected 29 submissions for presentation 
during the conference and these are the articles that are included in this vol¬ 
ume. The best paper was awarded to Petros Mol and Scott Yilek for their paper 
“Chosen-Ciphertext Security from Slightly Lossy Trapdoor Functions.” The re¬ 
view process was run using the iChair software, written by Thomas Baigneres 
and Matthieu Finiasz from EPFL, LASEC, Switzerland, and we are indebted to 
them for letting us use their software. 

The program also included two invited talks: it was a great honor to have 
Daniele Micciancio and Jacques Stern as invited speakers. Their talks were enti¬ 
tled, respectively, “Duality in Lattice Based Cryptography” and “Mathematics, 
Cryptography, Security.” We would like to genuinely thank them for accepting 
our invitation and for contributing to the success of PKC 2010. 

Finally, we would like to thank our sponsors Google, Ingenico, and Techni¬ 
color for their financial support and all the people involved in the organization of 
this conference. In particular, we would like to thank the Office for Courses and 
Colloquiums ( Bureau des Cours-Colloques) from INRIA and Gaelle Dorkeld, as 
well as Jacques Beigbeder and Joelle Isnard from ENS, for their diligent work 
and for making this conference possible. We also wish to thank Springer for 
publishing the proceedings in the Lecture Notes in Computer Science series. 
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Phong Q. Nguyen 
David Pointcheval 
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Simple and Efficient Public-Key Encryption 
from Computational Diffie-Hellman in the 
Standard Model 


Kristiyan Haralambiev 1 '*, Tibor Jager 2 , Eike Kiltz 3 ’**, and Victor Shoup 4 ’*** 

1 Dept, of Computer Science, New York University, Courant Institute, 

251 Mercer Street, New York, NY 10012, USA 
kkh@cs.nyu.edu 

2 Horst Gortz Institute for IT Security, Ruhr-University Bochum, Germany 

tibor.j ager@rub.de 

3 Cryptology & Information Security Group, CWI, Amsterdam, The Netherlands 
kiltzOcwi.nl 

4 Dept, of Computer Science, New York University, Courant Institute, 

251 Mercer Street, New York, NY 10012, USA 
shoupOcs.nyu.edu 


Abstract. This paper proposes practical chosen-ciphertext secure 
public-key encryption systems that are provably secure under the compu¬ 
tational Diffie-Hellman assumption, in the standard model. Our schemes 
are conceptually simpler and more efficient than previous constructions. 

We also show that in bilinear groups the size of the public-key can be 
shrunk from n to 2 y/n group elements, where n is the security parameter. 

1 Introduction 

Security against chosen-ciphertext attack (CCA) is nowadays considered to be 
the standard security notion for public-key encryption. In this work we are inter¬ 
ested in practical schemes with proofs of security under mild security assump¬ 
tions (such as the computational Diffie-Hellman assumption), without relying 
on heuristics such as the random oracle model [2]. 

ElGamal Encryption. Let G be a cyclic group generated by g. The ElGamal 
encryption scheme, described as a key-encapsulation mechanism (Gen, Enc, Dec), 
is as follows 


Gen : sk = z, pk = Z = g z , Enc (pk) : C = g r , K = Z r , 

D ec(sk,C) : K = C z <E G, 

where all appearing exponents are chosen at random. It can be proved one-way 
(OW-CPA) secure under the computational Diffie-Hellman (DH) assumption, 

* Supported by NSF award number CNS-0716690. 

** Supported by the research program Sentinels. 

* * * Supported by NSF award number CNS-0716690. 


P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. l4-18 ,1 2010. 
(c) International Association for Cryptologic Research 2010 - 
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but its semantic (IND-CPA) security is equivalent to the stronger DDH as¬ 
sumption. To obtain an IND-CPA secure variant from the DH assumption one 
commonly uses the Goldreich-Levin [T3| hard-core predicate with ran¬ 

domness R to extract a pseudorandom bit from the Diffie-Hellman seed. By a 
standard randomness-reusing technique one obtains a scheme that encapsulates 
n-bit keys: 

Gen dh : sk dh = (zi,..., z n ), pk dh = (Zx = g Zl ,..., Z n = g Zn ), , . 

Enc (pk):C dh = g r , K dh = (/ gl (Z[, R), .. ., f gX {Z r n , R)) e {0,1}", {L) 

where decapsulation reconstructs the seed values Z\ by computing Z\ = C d ^. 
Combined with a one-time pad it yields an IND-CPA secure encryption scheme. 

IND-CCA security from Decisional Assumptions. Whereas CPA-secure 
schemes can be constructed generically, building CCA-secure schemes seems 
more difficult and usually requires stronger hardness assumptions. The first prac¬ 
tical CCA-secure encryption scheme (without random oracles) was proposed in 
a seminal paper by Cramer and Shoup m- Their construction was later gener¬ 
alized to hash proof systems |5j. However, the Cramer-Shoup encryption scheme 
and all its variants Enmnnmi inherently rely on decisional assumption, 
e.g., the Decisional Diffie-Hellman (DDH) assumption or the quadratic residuos- 
ity assumption. Moreover, there are groups, such as certain elliptic curve groups 
with bilinear pairing map, where the DDH assumption does not hold, but the 
DH problem appears to be hard. 

IND-CCA security from Computational Assumptions. The DDH as¬ 
sumption has often been criticized as being too strong |3|12j and in general 
wrong in certain cryptographically relevant groups m- Schemes based on the 
DH assumption are preferred but, surprisingly, even with strong tools such as 
the Cramer Shoup framework ;ID] such schemes seem to be hard to obtain. 

Canetti, Halevi and Katz [5] proposed the first practical public-key encryp¬ 
tion scheme based on a computational assumption, namely the Bilinear DH 
assumption in bilinear groups. Later, as a general tool to construct secure cryp¬ 
tographic primitives against active attacks, Cash et al. [8] proposed the Twin 
Diffie-Hellman (2DH) assumption. Though seemingly a stronger assumption, 
the interactive Strong 2DH assumption (which is the 2DH assumption where 
the adversary is additionally given an oracle that solves the 2DH problem for 
fixed bases) is implied by the standard DH assumption. Building on “IBE tech¬ 
niques” |4|5j . Cash et al. obtained the first practical encryption scheme which 
is CCA-secure assuming the strong 2DH assumption, and therefore also assum¬ 
ing the standard DH assumption. Here the decisional 2DH oracle provided by 
Strong 2DH assumption plays a crucial role in distinguishing consistent from 
non-consistent ciphertexts. However, to prove IND-CCA security, [Hj had to add 
n group elements to the ciphertext of the scheme from Equation m which ren¬ 
ders the scheme quite impractical. In independent work, Hanaoka and Kuro¬ 
sawa [13] used a different approach based on broadcast encryption, and could 
thereby reduce the number of group elements in the ciphertexts to a constant. 
According to [14] . their approach is not based on the twinning framework. 
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Recently, Hofheinz and Kiltz gave a CCA-secure encryption scheme based on 
the factoring assumption |18| . 

1.1 Our Contributions 

In this paper we propose a number of new encryption schemes that are CCA- 
secure assuming the standard DH assumption. We apply the Twin Diffie-Hellman 
framework from [8] to the CPA-secure scheme given in Equation (JT]) . Therefore 
our schemes are simple and intuitive. As summarized in [TF , Table 1], they 
improve efficiency of prior schemes from [8114] . 

A SCHEME FROM Strong DH. To illustrate our main ideas we first give a toy 
scheme that is IND-CCA secure assuming the Strong DH assumption ff] (The 
Strong DH assumption is that the DH assumption holds when the adversary is 
equipped with a (fixed-base) DDH oracle.) This is essentially the same scheme 
as ElGamal from Equation <[TJ), but one more group element is added to the 
ciphertext. 

Gen sdh : sk = ( sk dh ,x, x'), pk = ( pk dh , X = g x , X' = g x ) 

Enc sdh (pk) : C = (C dh , (X t X') r ), K = K dh , 1 J 

where t = T(C d h) is the output of a target collision resistant hash function. 
Decryption only returns K if the ciphertext C = (Co,Ci) is consistent, i.e., if 
Cq* +x = C\. In all other cases it rejects and returns _L. The additional element 
(X t X') r from the ciphertext is used as a handle for an all-but-one simulation 
technique (based on techniques from identity-based encryption [^) to be able to 
simulate the decryption oracle for all ciphertexts, except the challenge cipher- 
text. The above simulation technique works only if consistent ciphertexts can 
be distinguished from inconsistent ones, which is why we need the DDH oracle 
provided by the Strong DH assumption. 

First scheme from DH. Our first scheme, which is secure under the (standard) 
DH assumption, applies the twinning framework to the above idea by adding an 
additional element (Y t Y , ) r to the ciphertext. 

Gen dh i : sk = ( sk dh ,x, x ', y, y'), 

pk = (pk dh ,X = g x ,X' = g x ',Y = g y , Y 1 = g v') . , 
Enc dhl (pfc) : C = (C dh , {X'X'Y, (Y‘Y') r ), 1 ’ 

K = K d h- 

Again, decryption only returns K if the ciphertext is consistent, and _L otherwise. 
By analogy to the scheme from Equation @ it is IND-CCA secure under the 
Strong 2DH assumption which, by the Twinning theorem from [8], is implied by 
the standard DH assumption. Again, the Decisional 2DH oracle provided by the 
Strong DH assumption is crucial for distinguishing consistent from inconsistent 
ciphertexts in the reduction. 

Second SCHEME FROM DH. Our second scheme from the DH assumption ap¬ 
plies an “implicit rejection technique” to remove the second element from the 
ciphertext. 
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Gen dh2 : sk = (sfc dh , x, x', y, y'), 

pk = ( pk dh ,X = g*,X' = g*',Y = g v, Y' = g v') ( . 
Enc dh2 (pk) :C=(C dh ,(X t X') r ), 

K = K g © A' dh , where K c = G((rW') r ), 

where G : G —♦ {0, l} n is a secure pseudorandom generator. Decryption only 
returns K if the ciphertext C = (Co,Ci) is consistent, i.e., if Cq +x = C\. 
In that case Kq is computed as Kq = G(C'g t+y ). Unfortunately, we are not 
able to show full CCA security of this KEM but, instead, we are able to prove 
the weaker constrained CCA (CCCA) security [16] under the DH assumption. 
A CCCA-secure KEM plus a symmetric authenticated encryption scheme (i.e., 
a MAC plus a one-time pad) yields CCA-secure encryption. The intuition be¬ 
hind the security is similar to the scheme from Equation © with the difference 
that, during the simulation, the values Y and Y' are set-up such that, if the 
ciphertext is inconsistent, then the simulated decryption will produce Kq that 
is uniform in the adversary’s view and therefore K = Kq © K d h is also uniform. 
Consequently, when combined with symmetric authenticated encryption such 
inconsistent decryption queries will get rejected by the symmetric cipher. 

Reducing the size of the Public-Keys. Our schemes are quite practical, 
except for the large public-key which consists of ~ n group elements. We also 
propose two methods to reduce the size of the public-key when our schemes are 
instantiated over bilinear groups. Most interestingly, we note that the public-key 
can be shrunk from n to 2 y/n elements by ’’implicitly defining” the n elements of 
pk dh as Zij := e(Zi, Z'-), for i,j £ [1, y/n\. (Here e : G x G — > <&t is a symmetric 
bilinear map.) Note that now only the 2 v / n elements Zi, Z' } need to be stored in 
the public-keyO] Furthermore, in bilinear groups it is also possible to move the 
n values Z±,..., Z n from the public-key pk dh into the system parameter that 
can be shared among many users. In that case the public-key only contains one 
group element, but the system parameters are still of size « n. We remark that 
the observation of putting public-key elements into the systems parameters is 
not new and has been made before, e.g., for Water’s IBE scheme [21] ■ Finally, we 
also sketch how our ideas can be extended to construct an IBE scheme. All our 
bilinear constructions are CCA secure under the Bilinear DH (BDH) assumption. 

2 Preliminaries 

2.1 Notation 

In the following we let (G k ) kS n be a family of prime-order groups, indexed by 
security parameter k. Occasionally we write G shorthand for some group G K € 
(G k ) kS n, when the reference to the security parameter k is clear. We denote with 

1 We remark that this is a generic technique that may also be applied to other Difiie- 
Hellman based constructions suffering from large public keys, such as the DDH-based 
lossy trapdoor functions in mm • 
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poly (k) an unspecified positive integer-valued polynomial, and with negl(K) a neg¬ 
ligible function in k, that is, |negl(/c)| < o(l/n c ) for every positive integer c. For 
a positive integer n, we denote with [n] the set [n] = {1,..., n}. 


2.2 Key Encapsulation Mechanisms 

Let n = n(n) be a polynomial. A key-encapsulation mechanism (Gen, Enc, Dec) 
with key-space {0,1}" consists of three polynomial-time algorithms (PTAs). Via 
( pk,sk ) <— Gen(l") the randomized key-generation algorithm produces pub¬ 
lic/secret keys for security parameter k € N; via (C, K) <— Enc (pk) the random¬ 
ized encapsulation algorithm creates an uniformly distributed symmetric key 
K £ {0,1}", together with a ciphertext C; via K <— Dec(sfc, C ) the possessor of 
secret key sk decrypts ciphertext C to get back a key I\ which is an element in 
{0,1}" or a special rejection symbol _L. For consistency, we require that for all 
k £ N, and all ( C,K ) <— Enc(pfc) we have Pr[Dec(sfc, C) = K] = 1, where the 
probability is taken over the choice of (pk,sk) <— Gen(l"), and the coins of all 
the algorithms in the expression above. 

Chosen-Ciphertext Security. The common requirement for a KEM is in- 
distinguishability against chosen-ciphertext attacks (IND-CCA) [T0| where an 
adversary is allowed to adaptively query a decapsulation oracle with ciphertexts 
to obtain the corresponding session key. More formally, for an adversary A we 
define the advantage function 


AdvCCAK EMdhi (K) Pr 


(pk, sk) <— Gen(l") 

, _ (C,K 0 ) <- Enc(pk) 

■ K[ - {0,1}"; b <- {0,1} 
b'<- A Dec{ -\pk,K b ,C) 


1 

2 ’ 


where oracle Dec(Ci) returns K, <— D ec(sk, Ci). The restriction is that A is only 
allowed to query Dec(-) on ciphertexts C,; different from the challenge ciphertext 
C. A key encapsulation mechanism is said to be indistinguishable against cho¬ 
sen ciphertext attacks (IND-CCA) if for all PTA adversaries A, the advantage 
AdvCCAK EMdhi (k) is a negligible function in k. 

It was proved in [TO] that an IND-CCA secure KEM and a CCA-secure sym¬ 
metric encryption scheme yields an IND-CCA secure hybrid encryption scheme. 

Constrained Chosen-Ciphertext Security. Chosen-ciphertext security 
can be relaxed to indistinguishability against constrained chosen-ciphertext at¬ 
tacks (IND-CCCA) pT6]. Intuitively, one only allows the adversary to make a 
decapsulation query if it already has some “a priori knowledge” about the decap- 
sulated key. This partial knowledge about the key is modeled implicitly by letting 
the adversary additionally provide an efficiently computable Boolean predicate 
pred : {0,1}" —> {0,1}. If pred(K) = 1 then the encapsulated key K is returned, 
and _L otherwise. The amount of uncertainty the adversary has about the session 
key (denoted as plaintext uncertainty uncertj\) is measured by the fraction of 
keys for which the predicate evaluates to 1. We require this fraction to be neg¬ 
ligible for every query, i.e. the adversary has to have a high a priori knowledge 
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about the encapsulated key when making a decapsulation query. More formally, 
for an adversary A we define the advantage function 


AdvCCCA^ EMdh2 (K) Pr 


( pk , sk ) <— Gen(l n ) 

h _ yj (C, K 0 ) <- EncOfc) 

^<-{0,1}"; b <— {0,1} 

V <- A CDec ^-\pk,K h ,C) 


1 

2 ’ 


where oracle CDec^ree^, Ci) first computes I\i <— Dec(sfc,(7i). If H, = _L or 
pred i (K,) = 0 then return _L. Otherwise, return AT*. The restriction is that A is 
only allowed to query CDec^redj, Ci) on predicates pred i that are provided as 
PTA and on ciphertexts Ci different from the challenge ciphertext C. 

To adversary A in the above experiment we also associate A ’s plaintext un¬ 
certainty uncertain) when making Q decapsulation queries, measured by 


uncert j i(/c) 


1 

Q 


E 

1 <i<Q 


Pr VpredLAK) = 1] , 
icefo.i}" 1 


where predi ■ G —> {0,1} is the predicate A submits in the ith decapsulation 
query. Finally, a key encapsulation mechanism is said to be indistinguishable 
against constrained chosen ciphertext attacks (IND-CCCA) if for all PTA ad¬ 
versaries A with negligible uncertain), the advantage AdvCCCA^ EMdh2 (n) is a 
negligible function in k. 

It was proved in m that an IND-CCCA secure KEM plus a symmetric en¬ 
cryption scheme secure in the sense of authenticated encryption yields an IND- 
CCA secure hybrid encryption scheme. 

We refer to the full version PH Appendix A] for other definitions of standard 
cryptographic primitives such as hash functions and pseudorandom generators. 


2.3 Diffie-Hellman Assumptions 

Let G = G K be a cyclic group generated by g. Define 

dh(A, B) := (7, where A = g a , B = g b , and C = g ab . (5) 

The problem of computing dh(A, B) given random A, B £ G is the computational 
Diffie-Hellman (DH) problem. The DH assumption asserts that this problem is 
hard, that is, Pr[A(A, B ) = dh(A, B)] < negl(K.) for all probabilistic polynomial¬ 
time algorithms A. The DH predicate is defined as 

dhp(A, B, C) := dh(A, B) = C. 

The Strong DH assumption states that it is hard to compute dh(A, B), given 
random A, B £ G, along with access to a decision oracle for the predicate 
dhp(A, •, •), which on input ( B , C), returns dhp(A, B , C). 

Let dh be defined as in ©. Define the function 

2dlr : G 3 G 2 

(A 1; A 2 ,H) ^ (dh(A 1 ,H),dh(A 2 ,B)). 
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This function, introduced in |8J, is called the twin DH function. One can also 
define a corresponding twin DH predicate: 

2dhp(Ai,A 2 , B,6uC 2 ) := 2dh(A l7 A 2 ,B) = {C 1 ,C 2 ). 

The twin Diffie-Helhnan assumption states it is hard to compute 2dh(Ai, A 2 , B ), 
given random Ai,A 2 ,B £ G. The strong twin DH assumption states that it 
is hard to compute 2dh(Ai, A 2l B), given random Ai 7 A 2 ,B £ G, along with 
access to a decision oracle for the predicate 2dhp(Ai, A 2 , •, •, •), which on input 
(B, Ci, C 2 ), returns 2dhp(Ai, A 2 , B, C\, C 2 ). It is clear that the (strong) twin 
DH assumption implies the DH assumption. 

We will make use of a result from jS], which essentially states that the DH 
assumption implies the strong twin Diffie-Hellman assumption. 

Lemma 1 (Theorem 3 of [§]). Let G be a group of prime order p, log 2 p = 
poly(K). Suppose A is an adversary against the strong twin Diffie-Hellman prob¬ 
lem in G, running in polynomial-time in n and having non-negligible success 
probability. Then there exists a polynomial-time adversary B against the compu¬ 
tational Diffie-Hellman problem in G having non-negligible success probability. 

2.4 Hard-Core Functions 

In the following we denote with / g i : G x {0,1}“ —> {0,1}" a Goldreich-Levin 
hard-core function [133 f° r dh(H,H) with randomness space {0,1}“ and range 
{0,1}", where u and v are suitable integers (depending on the given group rep¬ 
resentation) . 

The following lemma is from [8] Theorem 9]. 

Lemma 2. Let G = G K be a prime-order group generated by g. Let A\, A 2l B 
G be random group elements, R A- {0,1}“, and let K = / g i(dh(Ai, B), R). 
Let U v A- {0,1}" be uniformly random. Suppose there exists a proba¬ 
bilistic polynomial-time algorithm B having access to an oracle computing 
2dhp(Hi, A 2 , •, •, •) and distinguishing the distributions 

Adh = (g, Ai, A 2l B, K, R) and A ran d = (g, Ai, A 2 , B, U v , R) 

with non-negligible advantage. Then there exists a probabilistic polynomial-time 
algorithm computing dh(H, B ) on input {A, B) with non-negligible success prob¬ 
ability. 

3 Chosen-Ciphertext Secure Key Encapsulation 

In this section we build our first CCA-secure key-encapsulation mechanism whose 
security is based on the DH assumption. 

Let G = G k be a group of prime order p and let n = n{n) be a polynomial. 
Let T s : G —> Z p be a hash function with key s that is assumed to be target 
collision resistant (see |T81 Appendix A] for a formal definition). Let KEMdhi = 
(Gen, Enc, Dec) be defined as follows. 
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Gen(l K ) Choose a random generator <7 4- G and randomness R 4- {0,1}“ for 
/ g i- Choose a random seed s for the hash function T s , choose random integers 
x, x', y, y', zi,..., z n Z p , and set X — g x , X' = g x ', Y = g v , Y' = g y ’, 
Z\ = g Zl , ..., Z n = g Zn . Set 

pk = ( g , X, X', Y, Y’, Z\,..., Z n , R, s ) and sk = ( pk , x, x', y , j/, 21 ,..., z n ) 
and return ( pk,sk ). 

Enc(pfc) On input of public key pk, sample r «^- Z p . Set Cq = g r , t = T s (Co), 
Ci = ( X t X’) r , C -2 = (F t F , ) r > and 


Return ((Co, Ci, C 2 ), K). 

Dec (sk, (C 0 , Ci, C 2 )) Set t = T S (C 0 ). If Ci ^ C% t+X ’ or C 2 ^ C v 0 t+v ' then return 
_L. Otherwise compute and return 

K = (f sl (C*\R),...,f sl (C*",R)). 

Theorem 1. Let T s be a target collision-resistant hash function and suppose 
that the computational Diffie-Heilman assumption holds in G. Then KEMdhi is 
IND-CCA secure. 

In the proof we use a trick from [1] to set up the public key and challenge cipher- 
text in a way to perform an all-but-one simulation. This enables the simulator to 
embed the given Diffie-Hellman challenge, while at the same time being able to 
decapsulate any ciphertext submitted by the adversary. We combine this tech¬ 
nique with the twinning technique from [ 8 ], to be able to check for consistency 
of submitted ciphertexts. 

PROOF. In the following we write (Cg, C-(, C|) to denote the challenge ciphertext 
with corresponding key Kq, denote with K \( the random key chosen by the 
IND-CCA experiment, and set t* = T s (Cg). 

We proceed in a sequence of games. We start with a game where the chal¬ 
lenger proceeds like the standard IND-CCA game (i.e., Kq is a real key and A'i 
is a random key), and end up with a game where both Kq and are chosen 
uniformly random. Then we show that all games are computationally indistin¬ 
guishable under the computational Diffie-Hellman assumption. Let W,; denote 
the event that A outputs b' such that b' = b in Game i. 


Game 0. This is the standard IND-CCA game. By definition we have 

r MWg = 1+ AdvCCA{^ EMdhi (k) 
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Game 1. We proceed as in Game [0] except that the challenger returns _L if 
the adversary queries to decapsulate a ciphertext (C'q,C\,C' 2 ) with Cq = Cg. 
Note that the probability that the adversary submits a ciphertext such that 
Cg = Cg before seeing the challenge ciphertext is bounded by q/p, where q is 
the number of chosen-ciphertext queries issued by A. Since q = poly(K), we 
have q/p < negl(ft). Moreover, a ciphertext is inconsistent, thus gets rejected, if 
Cq = Cg and C[ / or C 2 ^ C 2 , and is rejected by definition if C[ = C* and 
C ' 2 = C 2 *. Therefore 

|Pr[Wg — Pr[Wgjj | < negl(/c). 


Game 2. We define Game [ 2 ] like Game [TJ except for the following. Now the 
challenger aborts, if the adversary asks to decapsulate a ciphertext (Cg, C[, C 2 ) 
with Cg 7 ^ Cg and T s (Cg) = T s (Cg). By the target collision resistance of T s , we 
have 

|Pr[Wgj - Pr[B^j| < negl(A). 


Game 3. We define Game EH like Game El except that we sample Kq 4- {0, l} n " 
uniformly random. Note that now both Kq and K j* are chosen uniformly random, 
thus we have 

r M w £3j = \- 

We claim that 

|Pr[Wfgj - Pr[M^j| < negl(«;) 

under the computational Diffie-Hellman assumption. We prove this by a hybrid 
argument. To this end, we define a sequence of hybrid games Hq, ..., H n , such 
that Hq equals Game El and H n equals Game [3] Then we argue that hybrid Hi is 
indistinguishable from hybrid \ for i £ {1, ... ,n} under the computational 
Diffie-Hellman assumption. The claim follows, since n = n(n) is a polynomial. 
We define Hq exactly like Game El Then, for i from 1 to n, in hybrid Hi we 
set the first iv bits of Kq to independent random bits, and proceed otherwise 
exactly like in hybrid . Thus, hybrid H n proceeds exactly like Game [3] 

Let Ej denote the event that A outputs 1 in Hybrid i. Suppose 

|Pr[C 0 ] - Pr[C n ]| = l/poly 0 (K), ( 6 ) 

that is, the success probability of A in Hybrid 0 is not negligibly close to the 
success probability in Hybrid n. Note that then there must exist an index i such 
that |Pr[i£j_i] — Pr[l?j]| = l/poly(«;) (since if |Pr[Cj_i] — Pr[£^j]| < negl(«;) for 
all i, then we would have |Pr[£g] — Pr[S n ]| < negl(«;)). 

Suppose there exists an algorithm A for which (0 holds. Then we can con¬ 
struct an adversary B having access to a 2 dhp oracle and distinguishing the 
distributions Z\dh and A r an( j, which by Lemma El is sufficient to prove secu¬ 
rity under the computational Diffie-Hellman assumption in G. Adversary B re¬ 
ceives a challenge 5 = ( g , Ai, A 2 , B, L , R) as input, and has access to an oracle 


10 


K. Haralambiev et al. 


evaluating 2dhp(Ai, A 2 , ■, ■, •). B guesses an index i £ [n], which with proba¬ 
bility at least 1/n corresponds to the index i such that |Pr[£7i_i] — Pr[Ai]| = 
maxi |Pr[i?j_i] — Pr[S,]|, and proceeds as follows. 

Set-up of the public key. B picks random integers d, e, / 4- Z p , and sets X = 
Af, X' = Ai et *g d , Y = A 2 , Y' = Af g f , and Z, = Ai, where t* = T S (B). 
R is used as randomness for / g i(-, R), the rest of the public key is generated 
as in Game 0 Note that X, X'. Y. Y'. Z.- L are independent and uniformly 
distributed group elements. 

Handling decapsulation queries. When A issues a decapsulation query 
(C 0 = g r , Ci, C 2 ), B computes t = T S (C 0 ), X = (Ci/Cg) 1 /^-**), and 
Y = (C 2 /Cq ). Assuming t yf t* and that the ciphertext is formed 

correctly (that is, Co = g r , Ci = (X t X') r , and C 2 = (Y t Y') r ) we have 

1 = ((X t X , ) r /{g r ) d ) 1/(et ~ e * m) = (A e 1 r{t - t * ) g rd /g rd ) ll ( et - et *'> 

= A\ =dh(A lt C 0 ), 

and likewise Y = A 2 = dh(A 2 ,Co). B tests consistency of ciphertexts 
by querying 2dhp(Ai, A 2 , Co, X, Y”), which returns 1 if and only if X = 
dh(Ai, C 0 ) and Y = dh(A 2 , C 0 ). 

If this test is passed, then B sets Kq = (Kq !,..., Kq) as Kq i = 
f g i(X,R) and Kq 3 = f g \(CQ J ,R) for j £ [n] \ {z}. Since by Game 0 we 
have t ^ t* for all queries issued by A, B can answer all decapsulation 
queries correctly. 

Set-up of the challenge ciphertext. B sets Cq = B, C{ = B d , and C| = 
B?. Note that, by the set-up of X, X',Y,Y', this is a consistent ciphertext, 
since we have 

(. X e X') los 9 B = ((A\f A^ et ” g d ) Xo ^ B = B d 

and (similarly) (Y 1 Y') log s B = B?. Then B samples i—1 uniformly random 
bits Ki, ..., sets A',; = A, Kj = /gi((Co) 23 , R) for j from i + 1 to n, 

and outputs the challenge ((Cq, C|), (K 1 ,..., K n )). 

Now, if <5 4- Zldh then L = / g i(dh(f3, Zi), R). Thus A’s view when interacting 
with B is identical to Hybrid fA_i. If S 4- Z\ ran( j, then A’s view is identical to 
Hybrid Hi. Thus B can use A to distinguish 5 £ Adh from S £ A ra n d- □ 

We remark that the same proof strategy can be used to prove that the KEM given 
in equation @ (Section [l]) is CCA-secure under the Strong DH assumption. 

4 Constrained Chosen-Ciphertext Secure Key 
Encapsulation 

In this section we build a more efficient variant of our first CCA-secure key- 
encapsulation mechanism, which we cannot prove CCA-secure. However, we can 
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prove that it is secure in the sense of constrained CCA security, which is sufficient 
to obtain CCA-secure hybrid encryption. Again the security is based on the DH 
assumption. 

Let G = G K be a group of prime order p and let n = n{n) be a polynomial. 
Let KEMdh 2 = (Gen, Enc, Dec) be defined as follows. 

Gen(l' t ) Choose a random generator g G and randomness R 4- {0,1}“ for f s 
Choose a random seed s for the hash function T s : G —> Z p , choose random 
integers x, x', y, y', z\, ..., z n Z p , and set X = g x , X' = g x , Y = g v , 
Y' = g v , Z\ = g Zl ,..., Z n = g Zn . Let G : G —■> {0,1}" be a pseudorandom 
generator. Set 

pk = (g , X , X' , Y, Y', Zi,..., Z n , R, s, G) and sk = (pk, x, x' , y, y', z\, ..., z n ) 
and return ( pk,sk ). 

En c(pk) On input of public key pk, sample r Z p . Set Co = g r , t = T s (Co), 
Ci = (X‘A , ) r , Kq = G((y*F') r ), and 

A'dh = (f g \{Z[,R),..., fgi{Z r nl R)) 

Set K = Kq ® iCdh and return ((Co, C\),K). 

Dec(sfc, (Co, Ci)) Set t = T s (Co). If Ci ^ Cq* +x then return ±. Otherwise 
compute Kq = G(Cg t+!/ ) and 

Adh = (/gitCo 1 , R ),..., / g i(C 0 n , R)), 

and return K = Kq ® A'dh- 

Theorem 2. Let T s be a target collision-resistant hash function, G be a pseudo¬ 
random generator, and suppose that the computational Diffie-Heilman assump¬ 
tion holds in G. Then KEMdh 2 is IND-CCCA secure. 

Since we removed one element from the ciphertext (which was crucial to apply 
the twinning technique from the proof of Theorem |T| to check for consistency 
of ciphertexts) we have to use different means to prove the constrained chosen- 
ciphertext security of KEMdh 2 - Here we exploit the new set-up of the encapsu¬ 
lated key, which allows us to reject invalid ciphertexts “implicitly.” Due to space 
restrictions, the proof is deferred to the full version (T5] , 

5 Reducing the Size of the Public Key 

Let (G, Gt) be a bilinear group that is equiped with an efficiently computable 
pairing e : G x G —> Gt- (See, e.g., [614] A In this section we show that by 
instantiating our scheme from Equation © (Section [T]) in bilinear groups we are 
able to reduce the size of the public-key considerably. 
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5.1 Bilinear Diffie-Hellman Assumption 

Let 

bdh(A, B , C) := D , where A = g a , B = g b , C = g c , and D = e(g, g) abc . (7) 

The problem of computing bdh( A, B , C) given random A,B,C £ G is the compu¬ 
tational Bilinear Diffie-Hellman (DH) problem. The BDH assumption [Bj asserts 
that this problem is hard, that is, Pr [A(A,B,C) = bdh(A, B, C)\ < negl(ft) for 
all probabilistic polynomial-time algorithms A. 

In the bilinear setting, the Goldreich-Levin theorem [13] gives us the following 
lemma for a / g i : G-r x {0 ,1}“ —► {0,1}". 

Lemma 3. Let G = G K &e a prime-order group generated by g equipped with 
a pairing e : G x G —> G t- Let A,B,C G be random group elements, 
R <L~ {0,1}“, and let K = f g \(bdh(A,B,C),R). Let U v {0,1}" be uniformly 
random. Suppose there exists a probabilistic polynomial-time algorithm B distin¬ 
guishing the distributions 

A bdh = (g,A,B,C,K,R) and Z\ rand = (g, A, B,C,U V , R) 

with non-negligible advantage. Then there exists a probabilistic polynomial-time 
algorithm computing bdh(A, B , C) on input ( A , B , C) with non-negligible success 
probability, hence breaking the BDH assumption. 

5.2 Public-Key Encryption with Public Keys of Size 0(1) 

Our first idea is a variant where the elements sys = ( g , X , X ', Z i,..., Z n ) £ G” +3 
can be put into the system parameters (that can be shared among many users) 
and the public-key to contain only one single group element Y. Our encryption 
scheme can be viewed as a BDH-variant of a Decisional BDH scheme from |7I20| . 
It is defined as follows. 

Gen(l K ) Given the system parameters sys choose a random integer y z p , and 
set Y = g y . Set 

pk = Y and sk = y 

and return (pk,sk). 

En c(pk) On input of public key pk, sample r <L- Z p . Set Co = g r , t = T(Co), 
Ci = (X t X') r , and K = (K \,..., K n ), where 

Ki = f g \(e(Y r , Zf), R), for i £ [l,n]. 

Return ((Co, Ci), K). 

Dec (sk, (Co, Ci)) If e(Co, X b X') e(g , Ci) then return _L. Otherwise, compute, 

for each i £ [1, n], 

Ki = f s i(e(Co,Zi),R) 
and return K = (K \,..., K n ) £ {0, l} n ". 

Note that the consistency of the ciphertext is publicly verifiable, i.e., anyone 
could verify a ciphertext being consistent or not. 
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Theorem 3. Let T be a target collision-resistant hash function and suppose 
that the computational Bilinear Diffie-Hellman assumption holds in G. Then the 
above scheme is an IND-CCA secure KEM. 

PROOF. We proceed in a sequence of games similarly to Theorem [T] 

As before, we write (Cq,C*) to denote the challenge ciphertext with corre¬ 
sponding key Jig, denote with K ^ the random key chosen by the IND-CCA 
experiment, and set t* = T s (Cq). 

We start with a game where the challenger proceeds like the standard 
IND-CCA game (i.e., Kq is a real key and I\{ is a random key), and end up with 
a game where both Kq and K £ are chosen uniformly random. Then we show 
that all games are computationally indistinguishable under the computational 
Bilinear Diffie-Hellman assumption. Let Wj denote the event that A outputs b' 
such that V = b in Game i. 

Game 0. This is the standard IND-CCA game. By definition we have 

P r [W(oj = 2 + AdvCCAf< EMbdhi (K;) 

Game 1. We proceed as in Game [01 except that the challenger aborts, if the 
adversary queries to decapsulate a ciphertext (Cq, C[) with C' 0 = Cq . Note that 
the probability that the adversary submits a ciphertext such that Cq = Cq 
before seeing the challenge ciphertext is bounded by q/p , where q is the number 
of chosen-ciphertext queries issued by A. Since q = poly (ft), we have q/p < 
negl(ft). Moreover, a ciphertext is inconsistent, thus gets rejected, if C' 0 = Cq 
and C(^ C{, and is rejected by definition if C' 0 = Cq and C( = C\. Therefore 

|Pr[Wjij -Pr[Hgjj| < negl(ft). 

Game 2. We define Game [2] like Game [TJ except for the following. Now the 
challenger aborts, if the adversary asks to decapsulate a ciphertext (Cq,C[) 
with Cq yf Cq and T s (Cg) = T s (Cg). By the target collision resistance of T s , we 
have 

|Pr[IIf 2 j -Pr[Hfx]| < negl(ft). 


Game 3. We define Game |3] like Game El except that we sample Kq 4- {0, l} n " 
uniformly random. Note that now both Kq and K E are chosen uniformly random, 
thus we have 

= \- 

We claim that 

|Pr[W^j - Pr[Wgjl < negl(ft) 

under the computational Bilinear Diffie-Hellman assumption. We prove this by a 
hybrid argument. To this end, we define a sequence of hybrid games Hq, ..., H n , 
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such that H 0 equals Game [2] and H n equals Game [2] Then we argue that hy¬ 
brid Hi is indistinguishable from hybrid Hi-\ for ig {l,...,n} under the com¬ 
putational Bilinear Diffie-Hellman assumption. The claim follows, since n = u(k) 
is a polynomial. We define Hq exactly like Game [2] Then, for i from 1 to n, in 
hybrid Hi we set the first iv bits of Kq to independent random bits, and pro¬ 
ceed otherwise exactly like in hybrid Hi- Thus, hybrid H n proceeds exactly 
like Game [21 

Let Ei denote the event that A outputs 1 in Hybrid i. Suppose that 

|Pr[^]-Pr[£7 n ]| = l/poly 0 (K), (8) 

that is, the success probability of A in Hybrid 0 is not negligibly close to the 
success probability in Hybrid n. Note that then there must exist an index i such 
that |Pr[£’j_i] — Pr[l?j]| = l/poly(«;) (since if |Pr[2£j_i] — Pr[Sj]| < negl(«;) for 
all i, then we would have |Pr[Ao] ~ Pr[A n ]| < negl(fc)). 

Suppose that there exists an algorithm A for which © holds. Then we 
can construct an adversary B distinguishing the distributions Z\bdh and Z\ ran d, 
which by Lemma [2] is sufficient to prove security under the computational 
Bilinear Diffie-Hellman assumption in G. Adversary B receives a challenge 
S — (g, A, B,C, L, R) as input, guesses an index i £ [n], which with proba¬ 
bility at least 1/n corresponds to the index i such that |Pr[i£j_i] — Pr[.E*]| = 
max,; |Pr[i?,_i] — Pr[i?,]|, and proceeds as follows: 

Set-up of the system parameters. B picks random integers d,e,f Z p , 
and sets X = A e , X' = A~ et g d 1 and Z % = A, where t* = T(G). The 
rest of the public key is generated as in Game[0l Note that C, X , A', Zi are 
independent and uniformly distributed group elements. 

Set-up of the public key. B sets Y = B. 

Handling decapsulation queries. When A issues a decapsulation query 
(Co = g r ,C i), B computes t = T s (Co) and tests the consistency of the 
ciphertext by verifying 


e(C 0 ,X t X') = e{g,C i). 

If the equality holds, then B sets K = (Ki ,..., K n ) as Kj = 
/gi^Cjy', Y"), R) for j £ [n] \ {«} and I\i = f g \(e(X,Y), R), where X := 
(Ci/Co) 1 /!®*-® 4 *). Note that 

X = ((X t X') r /(g r ) d y/( et ~ et *'> = (A r( - et ~ et *' > g rd / g rd ) 1 /( et - et *'> 

= A r = dh(A,C 0 ). 

Since by Game [2] we have t ^ t *, B can answer all decapsulation queries 
correctly for all queries issued by A. 

Set-up of the challenge ciphertext. B sets Cg = C and CJ = C d . Note 
that, by the set-up of X,X', this is a consistent ciphertext, since we have 

(X**X') los s c = ((AlY* A^ et *g d y° s a c = C d 
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Then B samples i— 1 uniformly random groups of v bits A'J,..., K*_ : , sets 
K* = A, K* = f g i(e(CQ,Y) z i,R) for j from i + 1 to n, and outputs the 
challenge ((Cg.CJ), ..., A'*)). 

Now, if 6 4- Zibdh then we have A = / g i(bdh(A, A, C), R). Thus yTs view when 
interacting with B is identical to Hybrid iA_i. If <5 Zi ran d, then A’s view is iden¬ 
tical to Hybrid H,;. Thus B can use A to distinguish S £ Zibdh from S £ Z\ ran d- □ 


5.3 Public-Key Encryption with Public-Key of Size 0(y/n) 

Our second idea reduces the size of the public-key from « n to « 2 yfn group 
elements (and no systems parameters). Assume n is a square and set r) := y / n. 
The public key contains elements Zi, Z(,..., Z^, Z' v £ G which implicitly define 
r] 2 = n distinct elements Z^- = e{Zj, Zj) in the target group G t- In our new 
scheme these elements can be used in place of Z\, ... , Z n . 

Gen(l K ) Choose a random generator j e- G and randomness R 4- {0,1}“ for 
/gi. Choose a random seed s for the hash function T a , choose random integers 
x,x',zi, z[,..., z v , z' Z p , and set X = g x , X' = g x , Zi = g Zl , Z[ = g z i, 
, Z v = g z -n, Z' v = g z ■n. Set 

pk = ( g , X, X ', Zi, Z[,..., Z v , Z' R, s) and sk = ( pk , x,x', z\, z[,..., z v , z' v ) 
and return ( pk,sk ). 

Enc(pfc) On input of public key pk, sample r Z„. Set Cn = q r , t = T s (Co), 
Cl = {x4y, and K = (K l lt ..., A.,), where 

Kij = f g i(e(Zl,Zj),R), for i,j £ [1 ,rj\. 

Return ((Co, Ci), K). 

Dec(sfc, (Co, Ci)) First reject if e(Co,X t X') y e(g,C\). Otherwise, for each 
i,j £ [ 1 , 77 ] compute 

Ki,,=f gI (e(Cy,Z'),R). 
and return K = (A' 1 , 1 ,..., K r] n ) £ {0,1}"". 

Like in the previous scheme, the consistency of the ciphertext is publicly 
verifiable. Furthermore, decryption can alternatively check consistency of 
the ciphertext by testing if CQ t+x = C 1 . 

Theorem 4. Let T s be a target collision-resistant hash function and suppose 
that the computational Bilinear Diffie-Hellman assumption holds in G. Then the 
above scheme is an IND-CCA secure KEM. 

PROOF. The proofs goes analogously to that of Theorem [3] with Game 3 defining 
hybrid games Hqo, Hi,i, Hi, 2 , • • ■, Hi v , H 24 , # 2 , 2 , • • •, H 2 )V , H 31 , ..., H v v 
(for convenience, we denote with H~j the game preceding H, j in this ordering, 


16 


K. Haralambiev et al. 


e.g. H^ 1 = H‘ 2 ,rj') • Assuming that each two consecutive hybrid games are indis¬ 
tinguishable by A, Game 2 (which is the same as i?i,o) is indistinguishable from 
H rhr] (which is the same as Game 3). But when both Kq and K{ are chosen 
uniformly random then we have 


Pr[Wa] = i 

So all we have to show is that indeed the hybrid games are indistinguishable. 

Suppose that there exists an algorithm A for which 

|Pr [E v , v ] - Pr[£?i, 0 ]| = l/poly 0 (K), (9) 

where Eij denotes the event that A outputs 1 in Hi j. Then there are i*,j* £ 
{1...?;} such that Pr[i?j»j-»] — Pr[i?^ •„] = 1/poly (re), where E~- denotes the 
event that A outputs 1 in H~y (If no such indices exist and the difference is 
negligible for all ( i,j ), then |Pr[^^] — Pr[£d,o]| = negl(re).) 

Then we can construct an adversary B distinguishing the distributions Z\bdh 
and Z\ r and, which by Lemma [3] is sufficient to prove security under the computa¬ 
tional Bilinear Diffie-Hellman assumption in G. Adversary B receives a challenge 
6 = (g, A, B,C, L, R) as input, guesses indices i,j £ [ 77 ], which with probability 
at least I/ 77 2 correspond to the indices i*,j* such that Pr[£// .»] — Pr [Ei*j*] | = 
maxjj- |Pr [£)“■] — Pr[£ ; ijJ ]|, and proceeds as follows: 

Set-up of the public-key. B picks random integers d. e. f A- Z p , and sets X = 
A e , X' = A~ et g d , Zi * = A , and Zd, = B, where t* = T S (G). The rest of the 
public key is generated as in scheme definition. Note that C, X, X ', Z j», Zj. 
are independent and uniformly distributed group elements. 

Handling decapsulation queries. When A issues a decapsulation query 
(Go = g r ,Ci), B computes t = T s (Co) and tests the consistency of the 
ciphertext by verifying 


e(Co,X t X') = e(<?,Gi). 

If the equality holds, then B sets K = (A'u,..., K v ^) as: 

- Kij = f g \(e(Co, Zj) Zi , R) for i £ [rj\ \ {**} and j £ [77], 

- K i%j = / g i(e(G 0 , Zi.) z *,R) for j £ [ 77 ] \ {j*}, and 

- Ki. d . = fgi(e(X,B),R), where X := (Ci/G^GG*-^*). 

Note that 

X = {{X t X') r /(g r ) d ) 1 A et ~ et *'> = (A r ( et - et *) g rd / g rd yl( et - et ') 

= A r = dh(A,G 0 ). 


Since by Game [2] we have t ^ t*, B can answer all decapsulation queries 
correctly for all queries issued by A. 
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Set-up of the challenge ciphertext. B sets Cq = C and C{ = C d . Note 
that, by the set-up of X,X', this is a consistent ciphertext, since we have 

(l‘l') loSsC = ((Alf Ai et ‘ g d ) lo ^ c = C d 

Then B sets the key K* = {K* -y,K* 2 , ■ ■ ■, , ■ ■ ■, A'*) accordingly: 

— the bits before K*, » uniformly at random; 

— and K* J = / g i(bdh((7, Zi, Zj),R) for the remaining v-b\t blocks K* - , i.e. 
i > i* or (i = i* A j > j*), which is possible because B knows Zi or z[y 

and outputs the challenge ((Cq , C*) , K*). 

Now, if S 4- /4bdh then we have L = / g i(bdh(y4, B , C), R). Thus ^4’s view when 
interacting with B is identical to Hybrid H~, ■». If S Z\ ra nd, then yl’s view 
is identical to Hybrid H^ j. Thus B can use A to distinguish <5 £ Zlbdh from 
$ G A rand . □ 

We remark that the above construction also extends to a Boneh-Boyen-style [J] 
identity-based encryption scheme selective-identity secure under the computa¬ 
tional Bilinear Diffie-Hellman assumption. The IBE scheme has the same pa¬ 
rameters as the above scheme, a user secret key for an identity id contains 2 n 
group elements of the form (g ZiZj ■ (X ld X') Si ’ j , g Si,d ) G G 2 . 

References 

1. Abdalla, M., Bellare, M., Rogaway, P.: The oracle Diffie-Hellman assumptions and 
an analysis of DHIES. In: Naccache, D. (ed.) CT-RSA 2001. LNCS, vol. 2020, 
pp. 143-158. Springer, Heidelberg (2001) 

2. Bellare, M., Rogaway, P.: Random oracles are practical: A paradigm for designing 
efficient protocols. In: Ashby, V. (ed.) ACM CCS 1993, pp. 62-73. ACM Press, 
New York (November 1993) 

3. Boneh, D.: The decision Diffie-Hellman problem. In: Buhler, J.P. (ed.) ANTS 1998. 
LNCS, vol. 1423, pp. 48-63. Springer, Heidelberg (1998) 

4. Boneh, D., Boyen, X.: Efficient selective-ID secure identity based encryption with¬ 
out random oracles. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. 
LNCS, vol. 3027, pp. 223-238. Springer, Heidelberg (2004) 

5. Boneh, D., Canetti, R., Halevi, S., Katz, J.: Chosen-ciphertext security from 
identity-based encryption. SIAM Journal on Computing 36(5), 915-942 (2006) 

6. Boneh, D., Franklin, M.K.: Identity-based encryption from the Weil pairing. In: 
Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 213-229. Springer, Heidelberg 
(2001) 

7. Boyen, X., Mei, Q., Waters, B.: Direct chosen ciphertext security from identity- 
based techniques. In: ACM CCS 2005, pp. 320-329. ACM Press, New York (Novem¬ 
ber 2005) 

8. Cash, D., Kiltz, E., Shoup, V.: The twin Diffie-Hellman problem and applications. 
In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 127-145. Springer, 
Heidelberg (2008) 


18 


K. Haralambiev et al. 


9. Cramer, R., Shoup, V.: Universal hash proofs and a paradigm for adap¬ 
tive chosen ciphertext secure public-key encryption. In: Knudsen, L.R. (ed.) 
EUROCRYPT 2002. LNCS, vol. 2332, pp. 45-64. Springer, Heidelberg (2002) 

10. Cramer, R., Shoup, V.: Design and analysis of practical public-key encryption 
schemes secure against adaptive chosen ciphertext attack. SIAM Journal on Com¬ 
puting 33(1), 167-226 (2003) 

11. Freeman, D.M., Goldreich, O., Kiltz, E., Rosen, A., Segev, G.: More constructions 
of lossy and correlation-secure trapdoor functions. In: Nguyen, P.Q., Pointcheval, 
D. (eds.) PKC 2010. LNCS, vol. 6056, pp. 282-298. Springer, Heidelberg (2010) 

12. Goldreich, O.: Foundations of Cryptography: Basic Applications, vol. 2. Cambridge 
University Press, Cambridge (2004) 

13. Goldreich, O., Levin, L.A.: A hard-core predicate for all one-way functions. In: 21st 
ACM STOC, pp. 25-32. ACM Press, New York (May 1989) 

14. Hanaoka, G., Kurosawa, K.: Efficient chosen ciphertext secure public key encryp¬ 
tion under the computational Diffie-Hellman assumption. In: Pieprzyk, J. (ed.) 
ASIACRYPT 2008. LNCS, vol. 5350, pp. 308-325. Springer, Heidelberg (2008) 

15. Haralambiev, K., Jager, T., Kiltz, E., Shoup, V.: Simple and efficient public-key 
encryption from Computational Diffie-Hellman in the standard model. Cryptology 
ePrint Archive, Report 2010/033 (2010), http://eprint.iacr.org/ 

16. Hofheinz, D., Kiltz, E.: Secure hybrid encryption from weakened key encapsulation. 
In: Menezes, A. (ed.) CRYPTO 2007. LNCS, vol. 4622, pp. 553-571. Springer, 
Heidelberg (2007) 

17. Hofheinz, D., Kiltz, E.: The group of signed quadratic residues and applications. In: 
Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 637-653. Springer, Heidelberg 
(2009) 

18. Hofheinz, D., Kiltz, E.: Practical chosen ciphertext secure encryption from factor¬ 
ing. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 313-332. Springer, 
Heidelberg (2009) 

19. Joux, A.: A one round protocol for tripartite Diffie-Hellman. Journal of Cryptol¬ 
ogy 17(4), 263-276 (2004) 

20. Kiltz, E.: Chosen-ciphertext security from tag-based encryption. In: Halevi, S., 
Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 581-600. Springer, Heidelberg 
(2006) 

21. Kiltz, E.: Chosen-ciphertext secure key-encapsulation based on gap hashed Diffie- 
Hellman. In: Okamoto, T., Wang, X. (eds.) PKC 2007. LNCS, vol. 4450, 
pp. 282-297. Springer, Heidelberg (2007) 

22. Kurosawa, K., Desmedt, Y.: A new paradigm of hybrid encryption scheme. In: 
Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 426-442. Springer, 
Heidelberg (2004) 

23. Peikert, C., Waters, B.: Lossy trapdoor functions and their applications. In: 40th 
ACM STOC, pp. 187-196. ACM Press, New York (2008) 

24. Waters, B.R.: Efficient identity-based encryption without random oracles. In: 
Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 114-127. Springer, 
Heidelberg (2005) 


Constant Size Ciphertexts 
in Threshold Attribute-Based Encryption 


Javier Herranz 1 , Fabien Laguillaumie 2 , and Carla Rafols 1 

1 Dept. Matematica Aplicada IV, Universitat Politecnica de Catalunya, 
C. Jordi Girona 1-3, Modul C3, 08034, Barcelona, Spain 
{jherranz,crafols}@ma4.upc.edu 
2 GREYC - Universite de Caen Basse-Normandie, 

Boulevard du Marechal Juin, BP 5186, 14032 Caen Cedex, France 
fabien.laguillaumie@unicaen.fr 


Abstract. Attribute-based cryptography has emerged in the last years 
as a promising primitive for digital security. For instance, it provides good 
solutions to the problem of anonymous access control. In a ciphertext- 
policy attribute-based encryption scheme, the secret keys of the users de¬ 
pend on their attributes. When encrypting a message, the sender chooses 
which subset of attributes must be held by a receiver in order to be able 
to decrypt. 

All current attribute-based encryption schemes that admit reasonably 
expressive decryption policies produce ciphertexts whose size depends at 
least linearly on the number of attributes involved in the policy. In this 
paper we propose the first scheme whose ciphertexts have constant size. 
Our scheme works for the threshold case: users authorized to decrypt 
are those who hold at least t attributes among a certain universe of at¬ 
tributes, for some threshold t chosen by the sender. An extension to the 
case of weighted threshold decryption policies is possible. The security 
of the scheme against selective chosen plaintext attacks can be proven 
in the standard model by reduction to the augmented multi-sequence of 
exponents decisional Diffie-Hellman (aMSE-DDH) problem. 

Keywords: attribute-based encryption, provable security, pairings. 


1 Introduction 

Encryption is the cryptographic primitive which provides confidentiality to dig¬ 
ital communications. In a traditional public key encryption scheme, a message 
is encrypted with the public key of the intended receiver, who is the only person 
able to decrypt. This level of confidentiality is enough for many real-life ap¬ 
plications, including e-mail and key escrow. However, new situations requiring 
different cryptographic functionalities appear constantly. 

Let us consider for example the case of anonymous access control: a system 
must be accessible only to those who have received the appropriate rights, which 
are defined by the system administrator. Let us imagine how such a process 
could be implemented with a standard public key encryption scheme. First, a 
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user A claims that he is actually user A. Second, the system sends to this user 
a challenge: a ciphertext computed with the public key of A (obtained from a 
certification authority, maybe), for some random plaintext. Third, A decrypts 
and sends back the plaintext. Fourth, if the plaintext is correct, the system checks 
if user A must have access to the system, and if so, A is accepted. This solution 
has some weaknesses, the main one being the lack of anonymity, as user A must 
reveal his identity to the system. Furthermore, each time the system wants to 
change its access control policy, it has to update the database containing all the 
users that have the right to access the system. 

A more desirable solution, employing encryption, would be as follows. First, 
in a (possibly interactive, physical) registration process, every potential user 
receives a secret key that depends on his age, his job, his company, his expertise, 
etc., in short, on his attributes. Later, the system defines his policy for access 
control as a (monotonic) family of subsets of attributes: attributes in one of such 
subsets must be held by a user in order to have the right to access the system; 
in particular, in an extreme case, this policy can contain a unique subset with 
the unique attribute ‘right to access system X’. When a user tries to access 
the system, he receives as a challenge a ciphertext computed by the system, on 
a random message, using the current access policy. If the policy changes, the 
system administrator just has to take into account the new policy for generating 
the future challenges. A user is able to decrypt the challenge only if his attributes 
satisfy the considered policy. In this way, if a user answers such a challenge 
correctly, he does not leak who he is, only the fact that his attributes satisfy the 
access control policy. 

Ciphertext-policy attribute-based encryption (ABE for short, from now on) is 
the cryptographic primitive which precisely realizes the functionality described 
in the previous paragraph. This primitive can be traced back to identity-based 
encryption |Sha84j (which can be seen as the particular case of ABE where the 
policy contains a single subset with a single attribute) and to fuzzy identity- 
based encryption |SW05| (the particular case of ABE where the policy is always 
defined by a predetermined threshold t: only users holding at least t attributes 
can decrypt). 

Related work. The first paper dealing explicitly with ABE was |GPSW06j . Two 
different and complementary notions of ABE were defined there: key-policy ABE, 
where a ciphertext is associated to a list of attributes, and a secret key is associ¬ 
ated to a policy for decryption; and ciphertext-policy ABE, where secret keys are 
associated to a list of attributes (i.e. credentials of that user) and ciphertexts are 
associated to policies for decryption. It seems that ciphertext-policy ABE can 
be more useful for practical applications than key-policy ABE. Another related 
notion is that of fuzzy identity-based encryption [SW05) . which can be seen as 
a particular case of both key-policy and ciphertext-policy ABE. 

A construction of a key-policy ABE scheme was provided in |GPSW06j , while 
the first ciphertext-policy ABE scheme was proposed in |BSW07 : , but its security 
was proved in the generic group model. Later, a generic construction to transform 
a key-policy ABE scheme into a ciphertext-policy ABE scheme was given in 
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jGJPSOSj, with the drawback that the size of the ciphertexts is 0(s 3 ), if s is the 
number of attributes involved in the decryption policy. 

The most efficient ciphertext-policy ABE schemes in terms of ciphertext size 
can be found in [Wat081IDHMR08] , the size of a ciphertext depending linearly 
on the number of attributes involved in the specific policy for that ciphertext. 
For example, in the case of (t, s)-threshold decryption policies, where there are s 
involved attributes and a user can decrypt only if he holds t or more attributes, 
the size of the ciphertexts in one of the schemes in jWatOSj is s + 0(1), whereas 
the size of the ciphertexts in the scheme in [DHMR08] is 2(s — t) + 0(1). Both 
schemes admit however general policies (general monotonic access structures) 
and make use of secret sharing techniques. 

All the constructions mentioned so far only achieve security under selective 
attacks, a model in which the attacker specifies the challenge access structure 
before the setup phase. The first CP-ABE scheme with full security has appeared 
very recently |LQ+10| . The size of the ciphertexts in this scheme is 2s + 0(1). 

A concept which is more generic than attribute-based encryption is that of 
predicate encryption |KSW08j : the decryption policy, chosen by the sender of 
the message, is hidden in the ciphertext, in such a way that even the receiver gets 
no information on this policy, other than the fact that his attributes satisfy it 
or not. Because of this additional strong privacy requirement, current proposals 
for predicate encryption consider quite simple (not very expressive) policies. 

We stress that all the existing proposals for ABE schemes produce ciphertexts 
whose size depends (at least) linearly on the number of attributes involved in 
the policy for that ciphertext. An exception is the scheme in |EM+09| , where 
ciphertexts have constant size; but this scheme admits only (s, s)-threshold de¬ 
cryption policies. Note that for this particular threshold case where t = s, the 
scheme in }DHMR08 ] already achieved constant-size ciphertexts. For more ex¬ 
pressive or general decryption policies, no existing scheme has short ciphertexts. 
This fact can limit the applications of ABE in real life, if we consider for example 
the case of anonymous access control, with a low bandwidth available for the 
communication between the user and the system administrator. 

An essential feature of ABE schemes is their collusion resistance property, 
which guarantees that a ciphertext can leak no information about the plaintext 
to users whose attributes do not satisfy the considered policy, even if the union 
of the attributes of these colluding users satisfies the policy. This property is 
essential to guarantee a reasonable level of security in many of the applications 
of ABE schemes, like anonymous access control or access to encrypted data. 

A notion similar to ciphertext-policy ABE but without this collusion resis¬ 
tance property has been considered under different names: policy-based encryp¬ 
tion |BM05] . cryptographic work flow [AMSOfij . etc. This notion is actually 
equivalent to the primitive of dynamic distributed identity-based encryption 
|CCZ06I IDHMR071IDP08I IDHMR 08 : the sender chooses ad-hoc a set of identi¬ 
ties and a monotonic access structure defined on this set; the ciphertext can be 
decrypted only if users associated to the identities of some subset in the access 
structure cooperate. 
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Our contribution. In this paper we propose the first collusion-resistant ABE 
scheme which produces constant size ciphertexts and which admits reasonably 
expressive decryption policies. Our scheme is inspired by the dynamic threshold 
(identity-based) encryption scheme from |DP08j . in which the ciphertext’s size 
was constant as well. As we have just said, this scheme directly leads to a weak 
ABE scheme, without the collusion resistance property. The challenge was to 
modify this scheme in order to achieve collusion resistance without losing the 
other security and efficiency properties, in particular that of constant size ci¬ 
phertexts. The resulting scheme works for threshold policies: the sender chooses 
ad-hoc a set S of attributes and a threshold t, and only users who hold at least 
t of the attributes in S can decrypt. An extension is possible in order to support 
also weighted threshold policies. 

Our new scheme achieves security against selective chosen plaintext attacks 
(sCPA), in the standard model, under the assumption that the augmented multi¬ 
sequence of exponents decisional Difhe-Hellman (aMSE-DDH) problem is hard 
to solve. This is essentially the same level of security that was proved for the 
scheme in |DP08) . Using well-known techniques, it is possible to obtain security 
against chosen ciphertext attacks (CCA), in the random oracle model. 

Organization of the paper. We define the syntactics of attribute-based encryp¬ 
tion and the required security properties in Section [21 where we also describe 
the aMSE-DDH problem, on which the security of our scheme will be based. Sec¬ 
tion [3] contains the description of our scheme, the details on its correctness and 
consistency checking, and finally the formal proof of its security. In Section [|] we 
discuss how to extend our threshold scheme to the case of weighted threshold 
decryption policies, and the (im)possibility to achieve CCA security from CPA 
security in the standard model using a generic conversion due to [Wat08] . The 
work is concluded in Section [5] 

2 Preliminaries 

In this section we describe the algorithms that form an attribute-based encryp¬ 
tion scheme which supports threshold decryption policies, as well as the basic 
security requirements for such schemes. We also introduce the computational 
problem called aMSE-DDH problem, to which we will relate the security of our 
scheme. 

2.1 Attribute-Based Encryption 

In a ciphertext-policy attribute-based encryption (ABE, for short) system, each 
user receives from a master entity a secret key which depends on the attributes 
that he satisfies (to soften the natural limitation of the unique trusted authority, 
the possibility to distribute the key extraction among several authorities has 
been investigated in [Cha07] h A sender can encrypt a message so that it can 
be decrypted only by users whose attributes satisfy some policy of his choice, 
and which may depend of the message. Since the basic scheme that we propose 
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in Section [3] works for threshold decryption policies, we describe the protocols 
and security model with respect to these threshold policies: the sender chooses 
a subset S of attributes and a threshold t such that 1 < t < |5|, and encrypts 
a message m for the pair ( S,t ). A particular user will be able to decrypt the 
ciphertext only if he holds t or more attributes in S. The protocols and security 
model for ABE schemes supporting more general decryption policies can be 
described in a very similar way. 

Syntactic Definition. A ciphertext-policy attribute-based encryption scheme 
ABE = (Setup, Ext, Enc, Dec) supporting threshold decryption policies consists 
of four probabilistic polynomial-time algorithms: 

— The randomized setup algorithm Setup takes a security parameter A and a 
universe of attributes V = {ati,..., at m } as inputs and outputs some public 
parameters params, containing in particular the set V , which will be common 
to all the users of the system, along with a secret key msk for the master 
entity. The public parameters will be an input of all the following algorithms. 
We write (params, msk) <— ABE.Setup(l A , V) to denote an execution of this 
algorithm. 

— The key extraction algorithm Ext is an interaction between a user and the 
master entity. The user proves to the master entity that he enjoys a subset 
A C V of attributes. After verifying that this is actually the case, the master 
entity uses his master secret key msk to generate a secret key sk^ (which 
depends on the subset A of attributes), and gives it to the user. We refer to 
an execution of this protocol as skyt <— ABE.Ext(params, A , msk). 

— The encryption algorithm Enc takes a subset of attributes S CP, a thresh¬ 
old t such that 1 < t < |5j, and a message M as inputs. The output is 
a ciphertext C. We denote an execution of the encryption algorithm as 
C <— ABE.Enc(params, S , t, M). 

— The decryption algorithm Dec takes a ciphertext C for the pair (S,t) and a 
secret key sk^ corresponding to some subset A of attributes as inputs. The 
output is a message M. We write M <— ABE.Dec(params, C, (S, t), sk^) to 
refer to an execution of this protocol. 

For correctness, it is required that 

ABE.Dec(params, ABE.Enc(params, S, t, M), (S, t), sk^) = M, 

whenever \A D 5| > t and the values params, msk, sk^ have been obtained by 
properly executing the protocols ABE.Setup and ABE.Ext. 

Security Model for ABE Schemes. Most previous schemes (all but the 
one in |LQ+10[ ) consider only security under selective chosen plaintext attacks. 
This is also the security level that will be provably achieved by our scheme. 
Indistinguishability under selective chosen plaintext attacks (IND-sCPA security, 
for short) for an attribute-based encryption scheme ABE supporting threshold 
decryption policies and for a security parameter A € N is defined by considering 
the following game that an attacker A plays against a challenger: 
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1. The challenger specifies a universe of attributes V of size m and gives it to 
the attacker A. 

2. A selects a subset S C V of s attributes and a threshold t such that 1 < t < s. 

3. The challenger runs (params, msk) <— ABE.Setup(l A , V) and gives params 
to A. 

4. [Secret key queries:] A adaptively sends subsets of attributes B C V, with 
the restriction |f?n>S'| < t , and must receive sk# <— ABE.Ext(params, B 1 msk) 
as the answer. 

5. A outputs two messages Mo, Mi of the same length. 

6 . [Challenge:] The challenger picks a random bit b* £ {0,1}, computes 
C * <— ABE.Enc(params, S,t,Mb *) and gives C* to A. 

7. Step 4 is repeated. 

8 . A outputs a bit b. 

The advantage of such an adversary A in breaking the IND-sCPA security of the 
ABE scheme is defined as 

Adv^ PA (A) = |2Pr[6 = 6*]-l|. 

An attribute-based encryption scheme ABE is said to be IND-sCPA secure if 
a~be PA (A) is negligible with respect to the security parameter A, for any 
polynomial time adversary A. 

Note also that collusion resistance follows from the fact that the adversary can 
make multiple adaptive secret key queries both before and after the challenge 
phase. 

This is not the strongest security notion that one can consider for ABE 
schemes. On the one hand, the attacker A can be allowed to make decryption 
queries, for ciphertexts C' of his choice (corresponding to pairs ( S',t ')), with the 
restriction that the challenge ciphertext C* is never queried for the challenge 
pair ( S,t ). On the other hand, A can be allowed to choose the challenge pair 
(S,1) not at the beginning of the game, but at the same time when he chooses 
the two messages Mo, Mi. In this case, we say that A is a chosen ciphertext 
attacker, and that his goal is to break the CCA security of the ABE scheme. 

2.2 The Augmented Multi-sequence of Exponents Diffle-Hellman 
Problem 

Our scheme uses an admissible bilinear map (or pairing) as an ingredient and its 
security relies on the hardness of a problem that we call the augmented multi¬ 
sequence of exponents decisional Diffie-Heilman problem , which is a slight mod¬ 
ification of the multi-sequence of exponents decisional Diffie-Hellman problem 
considered in [DP08I . The generic complexity of these two problems is covered 
by the analysis in |BBG05| , because the problems fit their general Diffie-Hellman 
exponent problem framework. 

Let Gi,G 2 ,G t be three groups of the same prime order p (this is called a 
bilinear group triple in the sequel), and let e : Gi x G 2 —> G t be a non¬ 
degenerate and efficiently computable bilinear map. Let go be a generator of 
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Gi and let ho be a generator of G 2 . In practice, the bilinear map e can be 
implemented on any pairing-friendly (hyper-)elliptic curve [FSTlOj : no more 
assumptions are made on the groups Gi and G 2 , or on the hypothetical existence 
of an efficient isomorphism from the one to the other. 

Let £,rh,t be three integers. The ( £,fh,t)~ augmented multi-sequence of ex¬ 
ponents decisional Diffie-Heilman problem ((£, rh, f)-aMSE-DDH) related to the 
group triplet (Gi,G 2 ,Gt) is as follows: 

Input: the vector x | +r?l = (x\, ..., £| +rf J whose components are pairwise distinct 
elements of (Z/pZ)* which define the polynomials 


the values 


l £+rh 

m = H x + Xi) and g(X) = II (*+»), 


i— 1 


i=e +1 

50 , 50 ! ■ • 

i+i-2 

■>5o 

5 o ' 7 ' /(7) , ( 1 - 1 ) 

5o ’ 7 ! 

9o 

( 1 . 2 ) 

5o,5o V 

■ • •! 5o ’ 7 + ) 

(1.3) 

ho, ho,.. 

■ X m ~\ 

K 9 ™ ( 1 - 4 ) 

h%X\ 

...x im ~ 

(1.5) 

h$,hg J , 


-t)+3 

( 1 . 6 ) 


where k, a, 7 , u> are unknown random elements of (Z/pZ)*, and finally an element 
T £ G t- 


Output: a bit b. 

The problem is correctly solved if the output is b = 1 when T = e{go,ho) K, '^ 1 ' > 
or if the output is b = 0 when T is a random value from G t- In other words, the 
goal is to distinguish if T is a random value or if it is equal to e(go, /io) K '^ 7 ^ 
More formally, let us denote by real the event that T is indeed equal to T = 
e(<? 0 j M K ' /(7) , by random the event that T is a random element from G t and by 
T(x i +Al , k, a, 7 , u>, T) the input of the problem. Then, we define the advantage 
of an algorithm B in solving the (£, rh, t)-aMSE-DDH problem as 


Adv^’ m,t) aMSE - DDH ( / \) = | p r [B(l(~xi +fh ,K,a,'y,w,T)) = 1 1 real] 

— Pr [B(l(!c g +ih , k, a, = l| random] | 

where the probability is taken over all random choices and over the random coins 
of B. 


The only difference with the multi-sequence of exponents decisional Difhe-Hellman 
problem from |DP08| is the presence in the input of two additional lines (1.2) and 
(1.5). The generic hardness of this problem is a consequence of Theorem A.2 from 
[BBG05] . It is stated in the next proposition whose proof follows (almost exactly) 
that of Corollary 3 in |DP08i . 











26 


J. Herranz, F. Laguillaumie, and C. Rafols 


Proposition 1. For any probabilistic algorithm B making at most qc queries 
to the the oracle that computes the group operations (in groups Gi,G 2 ,Gt of 
order p) and the bilinear pairing e(-,-), its advantage in solving the aMSE-DDH 
problem satisfies 


Adv f™f)-* MSE - DDH (X) < fe + 2s + 2)2,rf 


'b 


where s = 4to + 3( + t + 3 and d = max{2(£ + 2), 2(m + 2), 4(m — t) + 10}. 

3 The New ABE Scheme 

This section is dedicated to the presentation of our ciphertext-policy attribute- 
based encryption scheme. 

In the decryption process, we will use the algorithm Aggregate of |DP08j . 
Given a list of values {g~ l+Xi ,Xi}i<j< n , where r, 7 £ (Z/pZ)* are unknown and 
Xi 7 ^ Xj if i 7 ^ j, the algorithm computes the value 


Aggregate}}^'>'+*«, Xi}i<i< n ) = 


using 0(n 2 ) exponentiations. 

Although the algorithm Aggregate of [DP08] is given for elements in G t, it is 
immediate to see that it works in any group of prime order. Running Aggregate 
for elements in Gi results in our case in a more efficient decryption algorithm. 

3.1 Description of the Scheme 
Setup, ABE.Setup(l A , V). 

The master entity chooses a suitable encoding r sending each of the m attributes 
at £ V onto a (different) element r(at) = x £ (Z/pZ)*. He also chooses a bilinear 
group triple (Gi,G 2 ,G't) of prime order p (such that p is A bits long) and a 
bilinear map e : Gi x G 2 —> G t- He selects a generator g of Gi and a generator 
h of G 2 . 

After that, he chooses a set V = {di ,..., d m - 1 } consisting of m — 1 pairwise 
different elements of (Z/pZ)*, which must also be different to the values x = 
r(at), for all at £ V. For any integer i lower or equal to to — 1, we denote as 
T>i the set {di,..., df}. Next, the master entity picks at random a, 7 € (Z/pZ)* 
and sets u = g ai and v = e(p a , h). The master secret key is then msk = (g , a , 7 ) 
and the public parameters are 



params = 


Key Extraction, ABE.Ext(params, A, msk). 

Given any subset A C V of attributes, the master entity picks r £ (Z/pZ)* at 
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Encryption, ABE.Enc(params, S, t, M). 

Given a subset S C V with s = |5| attributes, a threshold t satisfying 1 < t < s, 
and a message M £ (Gfy, the sender picks at random k £ (Z/pZ)* and computes 

r c, = 

j n (7+r( at )) n ( 7 +d) 

\ C 2 = h at£S d^m+t-l-s f 

I K = v K . 


The value C 2 is computed from the set {/i “ 7 }j=o,..., 2 m-i that can be found in 
the public parameters. The ciphertext is then (C\, C 2 , C 3 ), where C 3 = K ■ M. 

Decryption, ABE.Dec(params, (Ci, C 2 , C 3 ), (5, t), sk^). 

Any user with a set of attributes A such that | A D S| > t can use the secret key 
sk^ to decrypt the ciphertext, as follows. Let As be any subset of A n S with 
|As| = t. The user computes, from all at £ As, the value 


Aggregate({cp+’7 at >, r(at)} ateJ 4 s ) = g ll ** A s < 7+T<at)) 

With the output of the algorithm Aggregate the user computes 


_ r r-K-a■ n (7+r(at)) fl 

L = e ( 5 lla *^s w+T(a,)) , C 2 ) = e(g, h) atesXAs d€T,m+t 


( 7 +d) 


For simplicity we define r(ri) = d for all d £ V and given a set As C S , P(a s ,S)( l) 
is 


P(A s ,s)h) = M II (7 + r(at))- r(at)j. 

1 \ate(SUC m+t _i_ a )\A s ate(SuP m+t _i_ 3 )\A s / 

The crucial point is that, since |As| > t, the degree of the polynomial P(a s ,S) (A) 
is lower or equal to m — 2. Therefore, from the values included in skyi, the user 
can compute h rP(A S’ s '<^\ 

After that, the user calculates 

e(Ci, /i rP ^S’ s ) (7) ) • L = e(g, h) K ' r ' a ' U ‘ t€< - s r(at) ( 1 ) 

and 

e(C\, h ~) = e{g, h)~ K ' a ' r ■ e(g, h) K ' a (2) 

From Equation (U) the user can obtain 

e(g, h)— = (e{C u hr p l*B.*>W) . L ) r(at) 

and multiply this value in Equation (J2J). The result of this multiplication leads to 
K = e(g , /i) K ' a . Finally, the user recovers the message by computing M = C 3 /K. 
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3.2 Consistency Checking and Efficiency Considerations 

ft is not hard to prove that the new ABE scheme satisfy the correctness property: 
if all the protocols are correctly executed, and if | A fl S| > t, then sk^ allows to 
recover plaintexts that have been encrypted for the pair ( S,t ). 

It is worth noting that, by adding g a to the public parameters (this modifica¬ 
tion does not affect the security proof that we present in the next section), the 
users can check the consistency of the secret key they receive from the master 
entity. To do so, they must verify that, for all their attributes at £ A, 

e , h ai ■ (/i“) T(at) ) = e(g a , h r ) 

and then that, for i = 1 ,..., m — 2, 

e(g a ,h r7 ‘) =e(u,h rii ~^ 

r- 1 

Finally, they have to check that e(u, h i ) = e ( g a , h r ) /v. 

In terms of efficiency, the main contribution of this new scheme is the constant 
size of the ciphertext, which consists of one element of each group Gi, G 2 and 
G t- The encryption requires no pairing computations, but in + t + 1 exponen¬ 
tiations. The decryption process requires 3 pairing evaluations and 0(t 2 + m) 
exponentiations. The size of the secret key is linear in the number of attributes, 
as in all existing ABE schemes. 

3.3 Security Analysis 

We are going to prove that our scheme is IND-sCPA secure, assuming that the 
aMSE-DDH problem is hard to solve. 

Theorem 1 . Let A be an integer. For any adversary A against the IND-sCPA 
security of our attribute-based encryption scheme, for a universe of m attributes 
V, and a challenge pair ( S,t ) with s = |S|, there exists a solver B of the (£,rh,t)~ 
aMSE-DDH problem, for t = m — s, fh = m + t, — 1 and t = t + 1, such that 

AdA B MSEDDH { A) > i-W" D - sCR4 (A). 

Proof. We are going to construct an algorithm B that uses the adversary A as a 
black-box and that solves the (m — s, m +1 — 1, t + l)-augmented multi-sequence 
of exponents decisional DifRe-Hellman problem. The main trick in the proof will 
be to use the input of the aMSE-DDH problem to compute evaluations of some 
polynomials in 7 “in the exponent”. 

Let I( x 2 m+t-i-si k, a, 7 , w, T) be the input of the algorithm B. First, B spec¬ 
ifies a universe of attributes, V = {ati,..., at m }. Next, the adversary A chooses 
a set S C V of cardinal s that he wants to attack, and a threshold t such that 1 < 
t < s. Without loss of generality, we assume S = {at m _ s +i,..., at m } C V. From 
now on, we will denote by Ag the subset A fl S', for any subset of attributes A. 
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Simulation of the setup. The algorithm B defines the encoding of the at¬ 
tributes as r(atj) = ay for i = 1,..., m. Observe that the encodings of the first 
m — s elements are the opposite of the roots of f(X), and the encodings of the 
attributes in S are the opposite of some roots of g(X). 

The values corresponding to the “dummy” attributes T) = (c?i,..., d m - 1 } are 
defined as dj = x m +j if j = 1... m + t — 1 — s. For j = m + t — s, ... ,m — 1, 
the dj’s are picked uniformly at random in (Z/pZ)* until they are distinct from 

{xi, . . . , ^2m+f-l-si dm+t—si * • ■ > dj — 1 j-. 

The algorithm B defines g := g q ( 7 \ Note that B can compute g with the 
elements of line (1.1) of its input, since / is a polynomial of degree l. To complete 
the setup phase, B sets h = ho and computes 

— u = g ai = c/q 7 ^ 7 ' with line (1.3) of its input, which is possible since Xf(X) 
is a polynomial of degree £ + 1. Indeed, a ■ 7 • /(y) is a linear combination 
of {cry,..., cry^ +1 } and the coefficients of this linear combination are known 
to B , so the value u can be computed from line (1.3). 

— v = e(g,h) a = e(g^~ l ' >a 1 ho) with line (1.3) for g q ( 7, “. Note that the value 
g a could be computed by B and added to the public parameters, in case the 
verification of the consistency of the secret keys is desired for the scheme. 

The algorithm B can compute the values {h ai },:=o,..., 2 m-i from line (1.6) of its 
input. Eventually, B gives to A the resulting 

params = {V, m, u, v, {h ai }i=o,..., 2 m-i, 2?, r}. 


Simulation of key extraction queries. Whenever the adversary A makes a 
key extraction query for a subset of attributes A = {at,,,..., at.; n } CV satisfying 
that 0 < \As\ < t — 1, the algorithm B must produce a tuple of the form 



f V 

1 f 1 


skA = 

i <r +T(at) 

r A h 

,h~ \ 


1 

J ateA t J 2—0,... 

m -2 J 


for some random value r £ (Z/pZ)*. To do so, B implicitly defines r = (uyA') + 
1 )Qa{ 7 ). where x/a is randomly picked in (Z/pZ)*, and the polynomial Qa(X) 
is defined as Qa{ 7 ) = 1 when |As| = 0, or Qa{X) = \ A ■ II (X + r(at)) 

ateT s 


otherwise, in which case = (IlateAs r ( at )) 1 ■ 

The elements which form sk^ are then computed as follows: 


— For any at £ A$, B defines 

<2at(7) = Qa{i)/(a + 'Kat)) = Aa • (7 + r(at)). 

ate As, at^at 

Then pT+ T < at) = . gfh)QAi )_ f ac ^ or 0 £ p roc i uc t 

(whose exponent is a polynomial in 7 of degree at most (to — s) + 1 + t — 2 ) 
can be computed from line ( 1 . 2 ), whereas the second factor (whose exponent 
is a polynomial in 7 of degree at most (?n — s) +1 — 2 ) can be computed from 
line (1.1). 
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— For any at € A \ As, B defines the polynomial / at (AT) = f(X)/(X + r(at)). 
Then pT+ T < at > = g^A^>yAiQA{i) . gf^MQAh )_ Again, the first factor of this 
product can be computed from line ( 1 . 2 ), and the second factor can be com¬ 
puted from line ( 1 . 1 ). 

— The values ] h ri \ can be computed from line (1.4) and (1.5), since 

L J i=0,...,m-2 

ff’y' — j l QA(j)uiyAY +1 . fftAii)'!' m 

r-l ~ , . Q A (7)~1 

— Finally, B has to compute h ~f = h9 A ' rl ' uyA ■ h -r . The first factor of 
the product can be computed from line (1.5) and the second factor can be 
computed from line (1.4), since by definition of A^i, Qa(X) is a polynomial 
with independent term equal to 1 and thus " 1 is a linear combination 
of { 1 , 7 ,... ,7 t-2 }. 

Note that Qa(j ) ^ 0 (otherwise 7 = r(at) for some at G As and 7 is public), 
in which case it is not hard to see that r is uniformly distributed in Z/pZ. If 
the choice of ija leads to r = 0 (which occurs only with negligible probability 
anyhow), it suffices to pick a different value for ija- That is, in the simulation r 
is uniformly distributed in (Z/pZ)*. 

Simulation of the challenge. Once A sends to B the two messages Mq and 
Mi, B flips a coin b G {0,1}, and sets C 3 = T ■ Mf,. To simulate the rest of the 
challenge ciphertext, B implicitly defines the randomness for the encryption as 
k! = k jci, and sets C (given in line (1.4) of the aMSE-DDH input). To 

complete the ciphertext, B computes C{ = 7 '^ 7 ^ from line (1.1) of the 

input, which is equal to u~ K . 

After the challenge step A may make other key extraction queries, which are 
answered as before. 

Guess. Finally, A outputs a bit b'. If b' = b, B answers 1 as the solution to 
the given instance of the aMSE-DDH problem, meaning that T = e(go, 
Otherwise, B answers 0, meaning that T is a random element. 

We now have to analyze the advantage of the algorithm B: 

Advg MSE ~ DDH (A) = | Pr [B(I(~x k, a, 7, u, T)) = 1 [real] — 

Pr [B(2(~x i +A , k, a, 7 ,w,T)) = l|random] | 

= | Pr [6 = b 1 1real] — Pr [b = b' (random] |. 

When the event real occurs, then A is playing a real attack and therefore 
| Pr [b = 1 real] — 1/21 = jAdv^ t J 7 5CPA (A). During the random event, the view of 

A is completely independent of the bit b ; in this case, the probability Pr[fo = b'} 
is equal to 1/2. Summing up, we obtain 

Advf se - ddh (A) > ^Adv^ sCPA (A). 


□ 
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4 Extensions 

In this section we discuss two possible extensions of the basic scheme that we have 
described and analyzed in the previous section. First, we study the possibility 
of supporting more general decryption policies, not only threshold ones. After 
that, we discuss the options to obtain security against chosen ciphertext attacks. 

4.1 More General Decryption Policies 

Although we have considered in this paper the special case of threshold de¬ 
cryption policies, attribute-based encryption schemes can be defined for general 
decryption policies. Such a policy is determined by a monotone increasing family 
r C 2 V of subsets of attributes, in V = {ati,..., at„}. This family (or access 
structure) is chosen by the sender at the time of encryption, in such a way that 
only users whose subset of attributes A belong to r can decrypt. Even if many 
users collude, each of them having a subset of attributes out of T, the encryption 
scheme must remain secure. 

The threshold ABE scheme that we have described and analyzed in this pa¬ 
per is inspired on the dynamic threshold identity-based encryption scheme of 
[DP08] . It is claimed in [DP08| that the threshold scheme there can be ex¬ 
tended to admit “all the classical cases” of more general access structures. 
However, this is not completely true, because their extension only applies to 
a sub family of access structures, weighted threshold ones. A family f C 2 P is 
a weighted threshold access structure if there exist a threshold t and an assign¬ 
ment of weights to : V —> Z + such that A £ T ^ w(at) > t. Of course, 

at eT 

there are many access structures which are not weighted threshold, for example 
r = {{ati, at 2 }, {at 2 , at 3 }, {at 3 , at 4 }} in the set V = {ati, at 2 , at 3 , at 4 }. 

The same extension proposed in |DP08| works for our threshold ABE scheme. 
Let K be an upper bound for w(at), for all at £ V and for all possible as¬ 
signments of weights that realize weighted threshold decryption policies. Dur¬ 
ing the setup of the ABE scheme, the new universe of attributes will be V' = 
{ati 11 1, ati 1 12,..., ati||AT,..., at„||l,..., at„||A'}. During the secret key request 
phase, if an attribute at belongs to the requested subset A C P, the secret key 
sk ^4 will contain the elements g-i+rt^h corresponding to at^ = at| |j, for all 
j = l,...,K. 

Later, suppose a sender wants to encrypt a message for a weighted threshold 
decryption policy T, defined on a subset of attributes S = {at 4 ,..., at s } (without 
loss of generality). Let t and to : S —> Z + be the threshold and assignment of 
weights that realize T. The sender can use the threshold ABE encryption routine 
described in Section EU with threshold t, but applied to the set of attributes 
S' = {ati 11 1, • • •, ati||u;(ati),..., at s ||l,..., at s ||w(at s )}. In this way, if a user 
holds a subset of attributes A £ T, he will have to( at) valid elements in his 
secret key, for each attribute at € A. In total, he will have w(at) > t valid 

at eA 

elements, so he will be able to run the decryption routine of the threshold ABE 
scheme and decrypt the ciphertext. 
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The security analysis can be extended to this more general case, as well. 
Therefore, we can conclude that our ABE scheme with constant size ciphertexts 
also admits weighted threshold decryption policies. 

4.2 Security under Chosen Ciphertext Attacks 

Some ABE schemes proposed in the literature BS\V1)7. iCN07t Wat08) achieve 
security under selective chosen ciphertext attacks (sCCA security). This is done 
in two steps. Firstly sCPA security is proved, and secondly the scheme is shown 
to admit delegation of secret keys: it is possible to compute a valid secret key 
sk^/ from a valid secret key sk^, for any A' C A. If this is the case, the basic 
ABE scheme can be viewed as a hierarchical ABE scheme, where the hierarchy 
is the classical one: a user holding attributes A is over a user holding attributes 
A', if A' C A. Finally, the techniques developed in |CHK04j can be applied to 
this sCPA secure hierarchical ABE scheme, which results in a sCCA secure ABE 
scheme, in the standard model. 

Unfortunately our scheme does not seem to admit delegation of secret keys. 
Therefore, it is still an open problem to come up with an ABE scheme with 
constant size ciphertexts, achieving sCCA security in the standard model. In 
contrast, if one requires security in the random oracle model only, such a result 
is easily obtained by applying to our scheme (a variant of) some classical CPA 
to CCA transformation, such as the Fujisaki-Okamoto one |FuOk99| . 

5 Conclusion 

We have proposed in this paper the first (reasonably expressive) attribute-based 
encryption scheme with constant size ciphertexts. The design of the scheme is in¬ 
spired by the dynamic threshold encryption scheme in |DP08| . Our ABE scheme 
works for threshold policies: the sender chooses, at the time of encryption, the 
involved set of attributes and a threshold, in such a way that only those users 
holding (at least) this threshold of the involved attributes can decrypt. How¬ 
ever, the scheme can be easily extended to admit weighted threshold decryption 
policies, as well. 

Although finding attribute-based encryption schemes with short ciphertexts 
supporting even more expressive decryption policies is an important open prob¬ 
lem, weighted threshold decryption policies are quite expressive and can cover a 
wide range of applications. Therefore, we think that our proposal achieves a fair 
trade-off between expressiveness and efficiency. 

Our scheme employs bilinear pairings, and its security is based on the as¬ 
sumption that a newly introduced problem, the augmented Multi-Sequence of 
Exponents Decisional Difhe-Hellman (aMSE-DDH) problem, is hard. It remains 
an open problem to obtain a scheme with constant ciphertext’s length whose 
security is based on a more standard algorithmic problem and which achieves 
full security (i.e. not only selective security). 
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Abstract. In this paper, we fully break the Algebraic Surface Cryp¬ 
tosystem (ASC for short) proposed at PKC’2009 [3|. This system is 
based on an unusual problem in multivariate cryptography: the Section 
Finding Problem. Given an algebraic surface X(x,y,t) G W p [x,y,t\ such 
that deg^.^ X(x, y, t) = w, the question is to find a pair of polynomials 
of degree d, u x (t) and u y (t), such that X(u x (t),u y (t),t) = 0. In ASC, 
the public key is the surface, and the secret key is the section. This 
asymmetric encryption scheme enjoys reasonable sizes of the keys: for 
recommended parameters, the size of the secret key is only 102 bits and 
the size of the public key is 500 bits. In this paper, we propose a mes¬ 
sage recovery attack whose complexity is quasi-linear in the size of the 
secret key. The main idea of this algebraic attack is to decompose ideals 
deduced from the ciphertext in order to avoid to solve the section find¬ 
ing problem. Experimental results show that we can break the cipher for 
recommended parameters (the security level is 2 102 ) in 0.05 seconds. Fur¬ 
thermore, the attack still applies even when the secret key is very large 
(more than 10000 bits). The complexity of the attack is 0(w 7 dlog(p)) 
which is polynomial with respect to all security parameters. In particu¬ 
lar, it is quasi-linear in the size of the secret key which is (2d + 2) log(p). 
This result is rather surprising since the algebraic attack is often more 
efficient than the legal decryption algorithm. 

Keywords: Multivariate Cryptography, Algebraic Cryptanalysis, 
Section Finding Problem (SFP), Grobner bases, Decomposition of ideals. 


1 Introduction 

In 1994, Shor designed a quantum algorithm to compute efficiently discrete loga¬ 
rithm and factorization |16] . Hence, if one could construct a quantum computer, 
a huge number of well established public key cryptosystems - for instance, RSA 
or Elliptic Curve based systems - would be seriously threatened. Therefore, 
cryptographers are continuously searching for post-quantum alternatives. The 
first step to design new cryptosystems is to identify hard problems to use as 
trapdoors. So far, most of the problems used in post-quantum cryptology can 
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be classified into three main categories: Multivariate cryptography, Code-based 
cryptography and Lattice-based cryptography. 

In this context, Akiyama, Goto, and Miyake propose a new multivariate public- 
key algorithm at PKC’2009: the Algebraic Surface Cryptosystem {ASC for short) 
[Jj. Interestingly, its security is based on a difficult problem which is not common: 

Section Finding Problem (SFP). Given an algebraic surface defined by the 
polynomial X(x, y, t) £ F p [a;, y, t] (where F p denotes the finite field of cardinality 
p), the question is to find two polynomials u x (t),u y (t) £ F p [f] of degree d, such 
that X(u x (t),u y (t),t) = 0. 

As stated in j3j, this problem is computationally hard: the only algorithm 
known so far induces to find roots of a huge multivariate polynomial system. 
Hence the idea of ASC is to use the surface as public key and the knowledge of a 
section of this surface as the trapdoor. In comparison to HFE |l5] or other multi¬ 
variate systems, ASC has some interesting and unusual properties. In particular, 
the keys are unexpectedly short. The security of multivariate systems is usually 
related to the difficulty of finding a zero of a system of low degree polynomials 
(often quadratic) in a huge number of variables. For instance, in the case of HFE, 
the size of the public key is precisely the size of the multivariate system: 265680 
bits for a security of 2 80 . In contrast with HFE, ASC enjoys a small public key 
of 500 bits for a security of 2 102 . More generally, for a security level of 2 d , the 
size of the public key of HFE is 0(d 3 ). In comparison, the public key of ASC 
is a unique high degree polynomial in only three variables: its size is 0 (d) bits 
for a security of 2 d . Actually, the authors explains that the keys of ASC are 
among the shortest of known post-quantum cryptosystems. More precisely, let 
w denote the degree of the public surface X in x and y. For a security level of 
p 2d , the size of the secret key is 2dlog(p) bits and the size of the public key is 
about wd\og(p). The main observation is that the sizes of the keys are linear in 
dlog(p), which is the logarithm of the security level. 

Although a completely different version of ASC [2] has been attacked by 
Ivanov and Voloch El, by Uchiyama and Tokunaga m and by Iwami (T2J , the 
new version of ASC, presented at PKC’2009, is resistant to all known attacks. 
We would like to mention that the decryption algorithm raises some questions. 
Indeed, one step of this algorithm is to recover some factors of given degree D 
of a univariate polynomial. In order to find those factors, the designers propose 
to recombine the irreducible factors of the polynomial by solving a knapsack. 
However, this problem is known to be NP-hard [TO] . Therefore, it is not clear if 
the cryptosystem remains practical for high security parameters. 

Main Results. In this paper, we describe a message recovery attack which 
can break ASC in polynomial time. One important step of the legal decryption 
algorithm is the factorization of a univariate polynomial. The key idea of the 
algebraic attack is to perform this factorization step implicitly by decomposing 
ideals deduced from the ciphertext. Indeed, decomposition of ideals can be seen 
as a generalisation of the standard factorization of polynomials. Hence, this 
technique allows us to bypass the Section Finding Problem, which is hard. 
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We present three versions of this attack. The Level 1 Attack is high-level, 
deterministic, offers a good view of the mechanisms involved and can be im¬ 
plemented straightforwardly into a Computer Algebra System such as Magma 
(code given in Appendix O . However, this version is not very efficient and can¬ 
not break ASC for the recommended parameters. The Level 2 Attack is based on 
the following observation: the polynomials occurring in ASC have a high degree 
in t and a rather low degree in x and y. Thus, it is natural to see expressions in 
t as coefficients instead of polynomials in f; in other words, in order to speed up 
the attack, we have to perform the computations in the ring ¥ p (t)[x,y] (where 
¥ p (t) is the field of fractions) instead of F p [x,y,t). In the Level 3 Attack, we 
replace the ground field F p (t) by a finite field F p n ss F p [t]/(P(t)) for a large 
enough D to avoid the swelling of the intermediate coefficients and to recover 
the initial message modulo P(t). Even more efficiently, we can split P(t) into 
several irreducible factors Pi(t) of small degree; the Chinese Remainder Theorem 
is then used to recombine the congruences and retrieve the original message. In 
this third version of the attack, the size of the plaintext determines the number 
of congruences required as well as the size of the finite fields considered. There¬ 
fore, the complexity of the Level 3 Attack is expected to be quasi-linear in the 
size of the secret key. This behaviour is confirmed by experimental results to¬ 
gether with a complexity analysis. The binary complexitjo of the Level 3 Attack 
is (Theorem [1]) : 

0 (w 6 size(m )) 

where size{m) denotes the binary size of the plaintext, w is the degree of X in 
the variables x and y and Of) is the “soft Oh” notation (see e.g. [IS Definition 
25.8]). Since the size of the secret key is smaller than size(m), the attack is also 
quasi-linear in the size of the secret key. In practice, size{m ) ~ diu log(p) (where 
d is the degree of the secret section). Thus the complexity of the attack is 

0 {w 7 d\og(p)). 

This can be compared with a lower bound on the binary complexity (see page 
S3 of the decryption algorithm: 

0 (log(p)(w 3 d 3 + dwlog(p))). 

It can be noted that the decryption algorithm is cubic in the size of the secret 
key. Therefore, increasing the size of the secret key does not secure the system, 
since the cost of the decryption algorithm increases faster than the cost of the 
attack. 

We implemented in Magma 2.15-7 the three variants. The Level 3 Attack 
can break ASC with parameters recommended in [5] ( d = 50, p = 2, w = 5) 
in only 0.05 seconds. Experiments confirm that increasing the size of the secret 
key with the parameters p and d does not really increase the security of the 


1 The binary complexity is the number of arithmetic operations on bits, whereas the 
arithmetic complexity is the number of arithmetic operations in the base ring. 
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system. We are still able to break it in few seconds, even when the size of the 
secret key is more than 10000 bits! We also try to increase the parameter w (the 
degree in x and y of the public surface). For a reasonable size of the public key 
(less than 4000 bits), the message can be recovered in few hours. Finally, we try 
to figure out whether it is possible to secure the system by increasing the size 
of the support of the surface (the parameter k). However, as predicted by the 
complexity analysis, this parameter has very few effect on the complexity of the 
attack. Thereby, we can consider the system as fully broken. 

Structure of the Paper 

After this introduction, the paper is organized as follows. In Section [21 we give 
a short description of the ASC cryptosystem as it is presented in [3]. Then, we 
explain the theoretical foundations of the attack. In Section [31 we describe the 
three variants of the attack and we show a concrete example by applying it to 
the toy example given in [5]. We also perform a precise complexity analysis in 
Section [5] Finally, we give some experimental results showing that the attack is 
scalable. 

2 Description of the Cryptosystem 

We give here a short description of ASC. For a more detailed presentation of 
this cryptosystem, we refer the reader to [3]. We consider the ring of polynomials 
F p [x,y,t] where p is a prime number. For any polynomial P £ F p [x,y,t\ 7 Ap 
denotes its support in F p (t)[x,?/] (that is to say the set of couples ( i,j ) £ N 2 
such that t^x l y J is a monomial of P). 

2.1 Parameters 

The cryptosystem ASC has four parameters: p is the cardinality of the ground 
field, and d is the degree of the secret section. These two parameters are especially 
important for the security. They have a direct impact on the binary size of the 
secret key, which is 2d log p. Another parameter is w the degree in x and y of 
the public surface X. The last parameter is fc, the cardinality of Ax (which is 
the support of X in F p (f)[:r, y}). The parameters w, d and p have an impact on 
the size of the public key which is approximatively dw log(p) bits. 

2.2 Keys 

The secret key is a pair of polynomials (u x (t), u y (t)) of degree d. 

The public key is given by: 

— A surface described by an irreducible polynomial X(x, y , t) £ F p [:r, y , t] such 
that X(u x (t),u y (t),t) = 0 and card(A_\') = k. 

— A m the support of the plaintext polynomial and {d < i j ' > £ the 

degrees of the coefficients. 

— Af the support of the divisor polynomial and {dff £ the degrees 

of the coefficients. 
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For encryption/decryption it is required that: 


A m c A f A x = {(i 1 + i 2 ,ji +j 2) : (h,j 1) £ A f , (i 2 , j 2 ) G A x }. 
max{* : (i,j) £ A x } < max{i : (i,j) £ A m } < max{i : (i, j) £ Af}. 
ma x{j : ( i,j ) £ A x } < max{j : (i,j) £ A m } < max{j : (i,j) £ A f }. 
deg t (X(x,y,t)) < max{d\j l) } { ij )eAni < ma x{4P} {iJ)eAf . 


2.3 Encryption/Decryption 

Encryption. Consider a plaintext embedded into a polynomial 

m(x,y,t) = Y, m ij(t)x l y j 

( 

where deg ( 771*7 (i)) = c ^i2 ' > ' Choose a random divisor polynomial 
f(x,y,t)= Y /bOKY 

(iJ)eAf 

where deg = d\P . Then select four random polynomials ro, ri, s 0 , si such 
that, for l £ { 0 , 1 }, 

n(x,y,t)= Y s e (x,y,t)= Y s{ ij^ xl y 3 

(iJ)eA f (iJ)eA x 

and Vi, j, deg(rff (t)) = deg(/y (t)),deg(s^ ) (t)) = deg(A%(f)). Finally, construct 
the ciphertext (F 0 (x,y,t), Fi(x,y,t)) where 

^b(®, 2 /, t) = m(x, y, t) + f(x, y, t)s 0 (x, y, t) + X(x, y, t)r 0 (x, y, t), 

Fi(x, y, t) = m(x, y, t) + f(x, y, t)si(x, y, t ) + X(x, y, t)n(x, y, t). 

Decryption. Consider he(t) = Fg(u x (t),u y (t),t),£ £ {0,1} and compute the 
difference ho(t) — hi(t) = f(u x (t),u y (t),t)(so(u x (t),Uy(t),t) — Si(u x (t),u y (t),t)). 
Next, find a factor of h 0 (t) — h\{t) whose degree matches deg (f(u x (t),u v (t),t)). 
Let f(t) denote this factor. Then computes fh(u x (t),u y (t ), t) = h 0 (t ) mod f(t). 
Finally, retrieve m{x, y , t) by solving the linear system: 

fh(u x (t), u y (t),t) = Y™ijku x (tyu y (t) j t k . 

There are potentially several factors of ho(t) — h\(t) whose degree is equal to 
deg (f(u x (t),u y (t),t)). So, we have to verify that we picked the good one. To 
do so, the designers of ASC propose to use a MAC to verify that m(x, y, t) = 
m(x,y,t). If the verification fails, we start again by considering another factor 
of ho(t) — h 1 (t). 

To find factors of ho(t)—hi(t) whose degree matches deg(/(w x (t), u v (t),t)), the 
designers propose to factor h 0 {t) — h\{t), then recombine its irreducible factors 
by solving a knapsack problem. However, the knapsack problem is NP-hard [TO] , 
Therefore, as pointed out in [3J, it is not clear if the decryption algorithm remains 
practicable when the security parameters are high. 
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2.4 Security of the System 

The designers of the cryptosystem propose the following parameters: 

~ P = 2. 

— d should be greater than 50. 

— w = deg XJ/ (X) = max{* + j : ( i,j ) £ Ax} should be greater than 5. 

— The lower bound on k is 3. 

The size of the secret key is around 100 bits and the size of the public key is close 
to 500 bits. According to the designers of ASC, there is so far no known attack 
faster than exhaustive search for these parameters. Therefore, the security level 
of ASC is expected to be the cost of exhaustive search of the secret key, namely 

p 2d+2 

3 Description of the Attack 

Overview of the Attack 

In this section, we propose a message recovery attack on the cryptosystem de¬ 
scribed above. 

The main point of the attack is to decompose ideals, instead of factoring the 
univariate polynomial obtained by evaluating Fq—Fi in the section (u x , u y ). This 
way, we can implicitly manipulate the so-called divisor polynomial f occurring 
in the decryption process. Consequently, we can avoid to solve the underlying 
Section Finding Problem, and we obtain a polynomial attack on ASC. 

First, we present a high-level and deterministic version of the attack (Algo¬ 
rithm Q]) based on two fundamental lemmas. Then, we speed-up the algorithm 
by considering the field of fractions F p (t) (Algorithm [2]) . Indeed, polynomials 
occurring in ASC have a high degree in t. Since the complexity of Grobner bases 
algorithms is linear in the complexity of the arithmetic in the ground field, it 
seems natural to compute in the field of fractions F p (t). Finally, we use a modular 
approach to implement efficiently the attack: we perform computations in some 
well-chosen finite fields F p [t]/(P) and recombine the results by using the Chi¬ 
nese Remainder Theorem (Algorithm |3|). Doing this, the size of the coefficients 
of intermediate values are bounded (these coefficients can be huge when compu¬ 
tations are performed in the field of fractions). This allows us to break bigger 
instances of ASC. In particular, we are able to break the system with recom¬ 
mended parameters in 0.05 seconds. Furthermore, this will allow us to perform 
a precise complexity analysis and to show that this attack is quasi-linear in the 
size of the secret key. Experimentally, we are able to break with this technique 
some instances where the size of the secret key is greater than 10000 bits. 

Now we compare the efficiency of the three versions of the attack on a small 
example. For instance, we consider the following parameters p = 11, d = 8, 
w = 5 and k = 3 and we use our Magma implementation. The Level 1 Attack 
(code given in Appendix) recover the plaintext in 136 seconds. As predicted, 
the Level 2 Attack is faster and can break the system in 74 seconds. Using the 
modular approach in the Level 3 Attack really speeds up the computations: it 
retrieves the plaintext in 0.06 seconds. 
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3.1 Level 1 Attack: Decomposition of Ideals 

The two following lemmas are the key elements of the attack. 

Lemma 1. Let I be the ideal I = {F 0 — F±, X) C F p [x,y,t). Then I = I\ fl I 2 
where I\ = (f,X) and I 2 = (sq — si, X). Genetically, the ideals I\ and I 2 are 
prime ideals of¥ p [x,y,t]. 

Proof. I=(F 0 - F U X) = (f (s 0 - «i), X)=hrl 2 . 

Lemma [1] shows that, once we managed to decompose the ideal (Fq — F\. X) = 
(/ (so ~ si), A), we can manipulate implicitly the polynomial / through I\. 

Remark 1. In order to decompose I , a strategy is to eliminate x from I by 
computing a Grobner basis of IC\¥ p [y, t\. Generically, this Grobner basis contains 
only one polynomial Q. If p is big enough, Q has in general two factors which 
depend on y and t (we do not consider the factors which are in F p [t]). This fact 
is confirmed experimentally. The two factors correspond to I\ and J 2 . Then, we 
can construct I\ (resp. I 2 ) by adding to I an appropriate factor of Q. Since 
deg (/) > deg y (si — So), the factor of Q with the highest degree in y is the one 
corresponding to I\. To factor efficiently the bivariate polynomial Q , we can use 
for instance the algorithm in 1111- 

Lemma 2 . Let J be the ideal of¥ p [x,y,t\ generated by J = (Fq, Fi, X) + I±. 
Then m(x, y , t) £ J. Moreover, J is a zero-dimensional ideal. 

Proof. J = (F 0 ,F U X) + h = (F 0 , F U X, f) = (m, /, X). 

Remark 2. Lemma [2] shows that we can compute explicitly a multivariate ideal 
which contains rn. Since we know A m , we can recover m by solving the following 
linear system: 


NFj(m) = 53 X] m i j k NFj(x' l y :1 t k ) = 0 

(hf)e4 m fc= 0 

where NFj denotes the normal form with respect to the ideal J for a chosen 
monomial ordering (the definition of the normal form is given in Appendix). 
Since Am £ J for all A £ F p , we retrieve to up to multiplication by a scalar. 

Remark 3. For efficiency purpose, we compute the Grobner bases with respect 
to the graded reverse lexicographical ordering (Definition |T| in appendix). Instead 
of computing the Grobner basis of (F 0 — F\,X) fl F p [y,t], it is also possible to 
compute a resultant to eliminate the variable x. 

Remark 4- The normal form NFj is a linear application from ¥ p [x,y,t] onto 
F P [x,y,t\/J. In the last step of the attack, we are searching for the intersection 
of its kernel with the F p -linear subspace generated by r m (where r m denotes the 
support of to in F p [x, y, t]). Therefore, the linear system has card(T m ) unknowns 
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Algorithm 1 . Level 1 Attack 

1: Compute a Grobner basis of the ideal {Fq — Fi,X) nF p [i/,t]. Generically this 
Grobner basis contains only one polynomial Q(y,t). 

2: Factor Q = YlQi(y,t). Let Qo(y,t) G W p [y,t\ denote an irreducible factor with 
highest degree with respect to y. 

3: Compute a Grobner basis of the ideal J = {Fq, F \, X, Qo). 

4: To retrieve the plaintext (up to multiplication by a scalar in F p ), solve the linear 
system over F p 

y y m i j k NFj(x l y 3 t k ) = 0. 

k =0 

If the system has no solution, go back to Step 2 and pick another factor of Q. 


and deg( J) equations (deg( J) = dim(F p [;r, y , t]/J) when F p [x, y, t\/J is seen as a 
Fp-vector space). From the Bezout bound [T5], deg(J) ~ deg(m) deg(X) deg(/). 
The decryption algorithm requires that deg {m(u x ,u y: t)) > card(T m ) (in order 
to solve the final linear system) and one can remark that cleg(X) deg(/) > 
deg(m(u x , u y , t)) « ddeg xy (m)+deg t {m) (since deg xy (f) > deg xy (m),deg t (f) > 
deg t (m) and deg (A) > d). Therefore, the linear system has more equations than 
unknowns: card(T' m ) < deg (m(u x ,u y ,t)) < deg(X) deg(f) < deg (J). 


3.2 Level 2 Attack: Computing in the Field of Fractions F p (t) 

Polynomials appearing in ASC have a high total degree, but their degree in 
the variables x and y is low. Hence, it is natural to consider these polynomials 
as bivariate polynomials in x and y over the field of fractions F p (f). Indeed, 
the degree in x and y are completely independent of the security parameter 
d. In this section, we explain how to adapt the attack in this context. Doing 
that, we expect to have a lower complexity. Indeed, many operations on ideals 
- for instance Grobner basis computations - are linear in the complexity of the 
arithmetic in the ground field. 

From now on, K denotes the field of fractions F p (f). 

First, we need to transpose the key lemmas in this new context. This can be 
done for Lemma Q] without any major modification: 

Lemma 3. Let I be the ideal I = {Fq — F\,X) (seen as an ideal of K.[x,y]). 
Then there exists I\ and F two strict ideals of K[a;, y] such that I = I\ D F and 

(f,x)ch. 

Unfortunately, Lemma [2] cannot be directly transposed in the context of the field 
of fractions. Indeed, the variety of the ideal J = {F 0 , Fi, X)+I\ = (to, /, X) (seen 
as an ideal of K[x,?/]) is generically empty since it is generated by three inde¬ 
pendent equations. Therefore we have to introduce a new variable z if we want 
to keep the ideal zero-dimensional and strictly included in K[a:,y, z]. Roughly 
speaking, the role of z is to deform the ideal (to, /, X) in order to introduce new 
elements in the variety: 
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Algorithm 2. Level 2 Attack: computing in the field of fractions K = F p (t) 

1: Compute the resultant Res x (Fo — F\,X) £ K[t/]. 

2: Factor the resultant Res^Po — F\,X) = f\Qi{y). Let Qo(y) £ K[y] denote an 
irreducible factor of highest degree in y. 

3: Compute a grevlex-Grobner basis of the ideal J = {Fq + z, F\ + z, X, Q o) C 
K[:r, y, z]. 

4: Consider the following linear system over K: 

NFj(z)+ m ij {t)NF J {x i y j ) = 0. 

If the system has no solution, then go back to Step 2 and choose another factor of 
the resultant. 

5: Return m = j)eA m m -ij(i)x 1 y 3 where (rriij(t)) is the unique solution of the 
linear system. 


Lemma 4. Let J C K[x, y, z] be the ideal J = (Fo + z, F\ + z, X) + I\. Then 
m(x, y,t) + z £ J. Moreover, J is a zero-dimensional ideal. 

Proof. (F 0 + z,Fi + z, X) + Ii = ( F 0 + z,Fi + z, X , /) = (m + z, /, X). 

To be able to recover the plaintext, we need to solve a linear system with 
card(A m ) unknowns and deg( J) equations. In practice, there are more equations 
than unknowns. Thus, if we choose a wrong factor of the resultant (a factor 
which is not a divisor of /), then the linear system has generically no solution, 
and we just have to restart from Step 2 until we find an appropriate factor. In 
practice, the irreducible factor of the resultant with the highest degree in y is 
almost always a good choice. 

Remark 5. It is also possible to combine the two versions of the attack by com¬ 
puting a Grobner basis of the elimination ideal and factoring it in F p [a;,?/,t], 
as in Level 1 attack (Steps 1 and 2 in Algorithm [[]). Then, once we found 
Q o £ F p [x,y,t\, we retrieve the message by computing a Grobner Basis of 
J = (F 0 + z, Fi + z, X, Qo) C K [x,y,z\ in the field of fractions (Steps 3,4,5 
in Algorithm [2|. 


3.3 Level 3 Attack: Computing in Finite Fields F p m. 

In this section, we study how to implement efficiently the attack in practice. In 
order to speed up the attack and to compute efficiently in the field of fractions, 
we perform all computations modulo polynomials of F p [t]. Indeed, a bound on 
the degree of m with respect to t is known since deg t (m) < ma x{d^}. 

We choose a constant C and n = deg t (m) \og(jp)/C irreducible polynomials 
Pi,..., P„ of degree close to C/ log(p) such that deg(P,;) > deg t (m). Then for 
each Pi, we consider 


F p[t\/{Pi) = Fpdeg(Pj) . 
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Algorithm 3. Level 3 Attack: computing in the finite fields IK = F p [t]/(P) 

1: Choose n ~ deg t (m) log(p)/C irreducible polynomials of degree ~ C/\og(p) such 
that J2 deg (Pi) > deg t (m). 

2: for i from 1 to n do 
3: Consider K = W p [t]/(Pi). 

4: Compute the resultant Res x (Fo — Pi, X) G K[j/]. 

5: Factor the resultant ReSa,(Po — Fi,X) = Y[Qi(y)- Let Qa(y) £ K[y] denote an 

irreducible factor of highest degree in y. 

6: Compute a grevlex-Grobner basis of the ideal J = (Fo + z, F\ + z, X , Qq) C 

K[x, y, z]. 

7: Consider the following linear system over K: 

NFj(z) + ^2 m ij (t)NFj(x 1 y : ’) = 0. 

If the system has no solution, then go back to Step 2 and choose another factor 
of the resultant. 

8: Retrieve a congruence m mod Pi = J2(ij)eA m m ij(t)x l y J where ( rriij(t )) is the 

solution of the linear system. 

9: end for 

10: Use the CRT to retrieve m = m mod ]”[ Pi. 


Considering all computations in IK = F p [t]/ (Pi) instead of F p (t), the attack yields 
m mod Pi. Finally we use the Chinese Remainder Theorem (CRT) to recover 
to mod Jl-Pj- Since deg(]~[ Pi) > deg t (m), we retrieve the plaintext. 

Remark 6. The linear system at step 7 in Algorithm [3] has only card(A m ) 
unknowns and deg(J) ~ deg xy (TO) deg a;y (/) deg a;y (A) equations. For practical 
parameters, card(T m ) « k is smaller than deg(J), thus the linear system is 
overdetermined and has in general only one solution. This fact is confirmed by 
experiments. 

The value deg (Pi) ~ deg t (m) is only dependent of the size of the plaintext. 
Therefore, the number of times we have to run the main loop of Algorithm [3] 
is linear in the size of the plaintext. Since the cost of arithmetic operations in 
F p [f]/(P) only depends on C (which is a constant chosen by the attacker), we 
expect this Level 3 Attack to be linear or quasi-linear in the size of the plaintext. 
This expectation will be confirmed by a complexity analysis and by experimental 
results. Besides, we would also like to mention that the main loop of Algorithm [3] 
can be easily parallelized. 


4 A Concrete Example 

We consider here the toy example given in [3]. We have 

- p= 17. 

— The secret key is ( u x , u y ) = (14f 3 + 12f 2 + 5t + 1, lit 3 + 3t 2 + 5t + 4). 
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— The public surface is 

X=(t+ 10 )x 3 y 2 + (16t 2 + 7t + 4 )xy 2 + (3t 16 + 8 t 15 + 13t 14 + 8 t 13 + 3t 12 + 
12t n + 4t 10 + 8 t 9 + 7t 8 + 4 1 7 + 13t 6 + 2t 5 + 5t 4 + 4 1 3 + 14 t 2 + 9 1+ 14). 

— The support of m and / are 

A m = {(4,4), (0,0)}, dZ = 17, da = 17, 

A f = {(5,5), (1,2), (0,0)},4 = 13,d{ 2 = 11,4 = 18- 

Here we show how to recover the message m from the ciphertext (To, T\) (given 
in S3) with the Level 3 Attack: 

1. Since deg t (m) = 17, we choose (for instance) Pi, P 2 , P 3 , P 4 £ F p [t) irreducible 
such that J]deg(Pi) > 18. In particular, 

Pi = t 5 + t + 14, 

P 2 = t 5 + 14t 4 + 4t 3 + At + 4, 

P 3 = t 5 + 9t 4 + 15 1 3 + 8 t 2 + At + 8 , 

P 4 = f 5 + lit 4 + lit 3 + 8 t 2 + 7t + 8 . 

First, we consider the finite field K = F p [f]/(Pi). 

2. Compute the resultant in K[j/]: 

Res x (P 0 - Pi, X) = (9 1 4 + 14t 3 + 4 t 2 + 6 t + 13)y 30 + (5t 4 + 1 3 + 14t 2 + 15t + 
8 )y 27 + (6 1 4 + 91 3 + 10t 2 + 7t + 14)y 26 + (7t 4 + 4t 3 + 8 t 2 + 5t + 8 )y 25 + (8 1 4 + 
4t 3 + 71 2 + 7t + 6 )y 24 + (12t 4 + 9t 3 + 8 1 2 + 13t)y 23 + (9t 4 + 4t 3 + 91 2 + 15t + 
6 )y 22 + (3t 4 + 6 t 3 + 10t 2 + 6 t + 6 )y 21 + (91 4 + 9t 3 + 13t 2 + 15i + 6 )y 20 + (4t 4 + 
4t 3 + 15t 2 )y 19 + (2t 4 + lit 3 + 21 2 + 5t + 2)y 16 . 

3. Then factor it in K [y\: 

Res x (P 0 - Pi, X) = y 16 (y + 81 4 + 3t 3 + 16t 2 + 8 t + 2 ) (; y 2 + 2 t 4 + 14t 3 + 14t 2 + 
6 1 + 10) ( y 2 + 15t 4 + 3t 3 + 3t 2 + lit + 7) ( y 2 + (14t 4 + 71 3 + 4t)y + 13t 4 + 10t 3 + 
71 2 + 8 t +1) (y 7 + (12t 4 + 7t 3 +1 2 + 5t + 15)y 6 + (t 4 + 5t 3 + 71 2 + 12t + ll)y 5 + 
(9t 4 + 14t 3 + 5 i 2 + lOt + 10)j / 4 + (4t 4 + 7 i 3 +1 2 + 7t + 14)y 3 + (lit 4 + 13t 3 + 
12t 2 + 8 t + 4)y 2 + (15t 4 + 91 3 + 16t 2 + 14t + 14 )y+ 141 4 + 3t 3 + 9t 2 + 15t + 8 ). 

4. Consider Q 0 an irreducible factor with highest degree: 

Q 0 = y 7 + (12 1 4 + 7t 3 + 1 2 + 5t + 15)y 6 + (t 4 + 5t 3 + 7 1 2 + 12t + ll)y 5 + (9t 4 + 
14t 3 + 5t 2 + lOt + 10)y 4 + (4t 4 + 71 3 +1 2 + 7t + 14)y 3 + (lit 4 + 13t 3 + 12t 2 + 
8 1 + 4)y 2 + (15t 4 + 91 3 + 16t 2 + 14t + 14)y + (14t 4 + 3t 3 + 91 2 + 15t + 8 ). 

5. Compute a Grobner basis G with respect to the grevlex ordering of the ideal 
J = (F 0 + z, Pi + z, X, Qq) c K[x, y, z]. 

6 . Since A m = {(0, 0), (4,4)} compute NFj(x 4 y 4 ): 

NFj{x 4 y 4 ) = N lZ + N 2 = (15t 4 + 3t 3 + t 2 + 13t+16)2+(5t 4 + llt 2 + t + 7). 

7. Solve the linear system z + maNFj{x 4 y 4 ) + moo = 0 over K: 


I m 0 o = AT 2 /A 7 i mod Pi 
1 to 44 = — 1/ATi mod Pi. 

8. Recover a congruence: m = moo + m^x 4 y 4 mod Pi. 

9 . Repeat the process with P 2 , P3 and P 4 . 
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10. Use the CRT to retrieve m = m mod Y\Pi- 

m = (5 1 17 +15f 16 + 4f 15 + 9f 14 + 7t 13 + 2f 12 + 3 1 11 + 8 1 10 + lit 9 + 6 1 17 + 6 t 8 + 
3t 16 + 10t 7 + lit 15 + 7 1 6 + t 5 +1 13 + 14t 4 + 10t 12 + 3f 3 + 3i 11 + 12t 2 + 8 t 10 + 
lit + 6 t 9 + 2 )x 4 y 4 + (13t 8 + 21 7 + 2 1 6 + 10t 5 + 5t 4 + 21 3 + 15t 2 + 3t + 11). 

5 Complexity Analysis 

In this part, we investigate the complexity of the Level 3 Attack. To simplify 
the notations, we suppose here that the complexity of multiplying two n x n 
matrices is 0(n 3 ). We note that C is a parameter chosen by the attacker. This 
parameter fixes the size of the finite fields considered. Indeed, we choose finite 
fields K = F p /(P,;) with deg(Pi) « C/log(p). Hence, log(card(K)) « C. 

1. First, we estimate the complexity of the computation of the resultant with 
respect to x in K[a;, y\ (where K = F p [t]/(P;)). According to [18] (Corollary 
11.18), this can be done in 0(w 3 ) operations in K, and the degree of the 
resultant is 0(w 2 ). 

2 . The probabilistic Cantor-Zassenhaus algorithm [TB] factors a polynomial of 
degree n over a finite field F 9 in 0(n 2 + n\og(q)) arithmetic operations in F g . 
Therefore the arithmetic complexity in K of the factorization of the resultant 
is 


0(w 4 + w 2 log(card(K))) = 0(w 4 + w 2 C). 


3. The degree of regularity of an ideal is an important indicator of the com¬ 
plexity of computing its Grobner basis with respect to the grevlex ordering: 
it is the highest degree of the polynomials occuring in the P 5 Algorithm. 
According to mm, if an ideal is spanned by m generic equations in n 
variables, then the complexity of computing a Grobner basis is: 



Since the ideal J = (to + z,f,X) is generated by three independent equa¬ 
tions, its degree of regularity can be estimated from the Macaulay bound 
(see [13) as 


dreg (J) = (deg xy (m + z) - 1) + (deg xy (f) - 1) + (deg(W) X!/ - 1) + 1 . 


For practical parameters, deg xy (m + z) ~ deg xy (f) ~ deg(A') a , y « w. There¬ 
fore, d reg « 3 w. The arithmetic complexity in K of the Grobner basis com¬ 
putation is then: 
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4. Finally we have a linear system to solve. The number of variables is card(A m ). 
For practical parameters, card(A m ) « k, which is less than 1000 (the re¬ 
commended parameter is k = 3). Hence, this step is negligible in practice 
compared to the Grobner basis computation, since an overdetermined linear 
system with less than 1000 variables in a finite field can be easily solved. 
Furthermore, this step is analog to the linear system which is solved in the 
legal decryption algorithm. Therefore this step of the attack is faster than 
the decryption algorithm which has to be efficient for practical parameters. 

The cost of an arithmetic operation in K is quasi-linear in log(card(K)) ss C. 
The number of times we have to run the main loop of the attack is size{m) /C. 
The complexity of the CRT is 0(size(m) log (size(m))) [T8] . Putting all the steps 
together, we find the total complexity of the attack: 

Theorem 1 . The total binary complexity of the Level 3 Attack is 

0(size(m)w 3 ) + 0(size(m)(w 4 + w 2 C)) + 0(size(m)w 6 ) + (D(size(m)). 
resultant factorization Grobner CRT 

Hence, the total binary asymptotic complexity of the attack is upper bounded by 

0(w 6 size(m)). 

Corollary 1. If we assume that size{m) ~ wdlog(p) (which is the case in prac¬ 
tice), then the binary complexity of the attack is: 0(dw 7 log(p)). 

Consequently, the attack is polynomial in all the security parameters and it is 
quasi-linear in the size of the secret key which is 2dlog(p). It can be noted that 
the parameter k has few effect on the complexity of the attack. 

A Lower Bound on the Complexity of the Decryption Algorithm 

The complexity of this attack has to be compared with a lower bound on the cost 
of the decryption process. During the decryption algorithm, one has to factor 
(F 0 — Fi)(u x (t),u y (t), t ) over F p [f]. The degree of this polynomial is at least dw. 
To the best of our knowledge, the best probabilistic factorization algorithms have 
an arithmetic complexity of 0(d 2 w 2 + dw\og(p)) [T5J. Moreover, there is also a 
knapsack to solve after the factorization. The complexity of this step is difficult 
to estimate so we do not consider it here (remember that we try to establish a 
lower bound). The last step of the decryption process is the resolution of a linear 
system with 0(dw ) variables: the arithmetic complexity of this step is 0(w 3 d 3 ). 
Finally, the total binary complexity of the decryption algorithm is unsharply 
lower bounded by 0(\og{p){w 3 d 3 + dwlog(p))) which is cubic in the parameters 
d and w, and quadratic in log(p). In comparison, the attack is quasi-linear in d 
and log(p), and polynomial of degree 7 in w. 

6 Experimental Results 

Workstation. The experimental results have been obtained with a Xeon bi¬ 
processor 3.2 GHz, with 64 GB of RAM. The instances of ASC have been 
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generated with Magma2.15-7. To compute the Grobner basis, we use the F 4 [7] 
implementation in Magma. 

To generate our instances, we pick £, d £ N and we consider the following 
parameters: 

- w = 2£+5. 

- A m = {(4 + £,4 + £), (0,0)}. 

- A x = {(3 + £,2 + £), (1 + £,2 + £), (0,0)}. 

- A f = {(5 + £,5 + £), (1 + £,2 + £), (1,2), (0,0)}. 

- V(i,j) £ A m ,4f = (2£ + 5)d + 21. 

- V(i,j) € Am, djP = (2£ + 5)d + 22. 

Construction of X, u x and u y 

u x ,u v £ Fp [t] are random polynomials of degree d. 

To construct X(x,y,t), we pick two random polynomials R\,R 2 £ F p [t] of 
degree 20 and we consider 

X = - u x (t) 3+e u v (t) 2+e ) + R 2 (t)(x 1+e y 2+e - u x (t) 1+e u y (t) 2+e ). 

Then we verify that X(x, y, t) is irreducible. If not, we restart by picking another 
Ri and another R 2 . 

Table Q] shows the complexity of the Level 3 Attack for different values of p 
and d. Each entry in the table is obtained by considering the average results over 
20 random instances of the cryptosystem. 

Table Notations 

t res denotes the time used for the computation of the resultant. tf ac t is the time 
used by the factorization of the resultant, whereas tcB denotes the cost of the 
Grobner basis computation. The time for solving the linear system and for the 
recombination by the CRT is negligible and hence are not given in the table. 
According to [5], there were no known attack better than exhaustive search when 
d > 50 and w > 5. Therefore the security bound is the cost of the exhaustive 
search of the secret section, namely p 2d+2 . 

Interpretation of the Results 

It is worth remarking that the first line of Table [H corresponds to the parameters 
recommended by the designers [3] and are broken in 0.05 seconds. The major 
observation is that the complexity of the attack behaves as predicted by the 
complexity analysis: it is quasi-linear in the parameter d. We also ran some 
experiments to study the impact of the parameter k (the cardinality of the 
support of the surface X) on the complexity: as expected, increasing k has very 
few effect on the cost of the attack. To summarize, we see in Tabled] that trying 
to secure the system by increasing the size of the secret key (that is to say by 
increasing the parameters p and d) is pointless: even when the size of the secret 
key is bigger than 10000 bits, the system can be broken in few seconds. 
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Table 1. Level 3 Attack - Experimental results with w = 5 


V 

d 

W 

k 

size of 
public key 

size of 
secret key 

tres 

tfact. 

tGB 

ttotal 

security 

bound 

2 

50 

5 

3 

310 bits 

102 bits 

0.02s 

0.02s 

0.01s 

0.05s 

2 wj 

2 

100 

5 

3 

560 bits 

202 bits 

0.03s 

0.02s 

0.02s 

0.07s 


2 

200 

5 

3 

1060 bits 

402 bits 

0.05s 

0.05s 

0.05s 

0.15s 

2 4UJ 

2 

400 

5 

3 

2060 bits 

802 bits 

0.1s 

0.1s 

0.1s 

0.30s 

2^1)2 

2 

800 

5 

3 

4060 bits 

1602 bits 

0.2s 

0.2s 

0.2s 

0.65s 

2-L6U2 

2 

1600 

5 

3 

8060 bits 

3202 bits 

0.3s 

0.3s 

0.4s 

1.0s 


2 

2000 

5 

3 

10060 bits 

4002 bits 

0.45s 

0.4s 

0.4s 

1.3s 

24UU^ 

2 

5000 

5 

3 

25060 bits 

10002 bits 

0.8s 

1.3s 

0.8s 

3.0s 

2lUUU^ 

17 

50 

5 

3 

1267 bits 

409 bits 

0.2s 

2.4s 

0.4s 

3.0s 

2 4uy 

17 

100 

5 

3 

2289 bits 

818 bits 

0.3s 

5.1s 

0.6s 

3.0s 

2 S1S 

17 

400 

5 

3 

8420 bits 

3270 bits 

1.45s 

27.7s 

3.9s 

33.1s 

2^271) 

17 

800 

5 

3 

16595 bits 

6500 bits 

3.1s 

70s 

9.5s 

83s 

2&5UU 

10007 

500 

5 

3 

34019 bits 

13289 bits 

29s 

217s 

64s 

310s 

213289 


The Parameter w 

In order to secure the system, one can think of increasing the parameter w since the 
attack is in 0(w 7 ). However, we showed that the complexity decryption algorithm 
is lower bounded by 0(w 3 ). Consequently, the parameter w should not be too 
high if the owner of the secret key wants to be able to decrypt. Table [2] gives the 
experimental results of the attack when w increases. 

Interpretation of the Results 

The main observation is that the complexity of the attack still behaves as pre¬ 
dicted: when iv is increased, the Grobner basis computation is the most expensive 
step. Increasing w seems to be the best counter-measure against the attack. How¬ 
ever, it should be noted that the attack is still feasible in practice, even when 
the public key is big. 


Table 2. Level 3 Attack - Experimental results: increasing w 


p 

d 

w 

k 

size of 
public key 

size of 
secret key 

tres 

tfact 

tGB 

tLinSys 

ttotal 

security 

bound 

2 

50 

5 

3 

310 bits 

102 bits 

0.02s 

0.02s 

0.01s 

0.001s 

0.05s 

2 WI 

2 

50 

15 

3 

810 bits 

102 bits 

0.7s 

0.3s 

4.4s 

0.03s 

5.4s 

2 TD2 

2 

50 

25 

3 

1310 bits 

102 bits 

3s 

Is 

32s 

0.2s 

37s 

2 TD2 

2 

50 

35 

3 

1810 bits 

102 bits 

10s 

3s 

260s 

Is 

274s 

2™ 

2 

50 

45 

3 

2310 bits 

102 bits 

30s 

7s 

1352s 

4s 

1393s 

2 102 

2 

50 

55 

3 

2810 bits 

102 bits 

70s 

12s 

4619s 

13s 

4714s 

2 102 

2 

50 

65 

3 

3310 bits 

102 bits 

147s 

22s 

12408s 

27s 

12604s 

2 TD2 

2 

50 

75 

3 

3810 bits 

102 bits 

288s 

38s 

37900s 

56s 

38280s 

2 102 
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7 Conclusion 

In this paper, we analyze the security of the PKC’2009 Algebraic Surface Cryp¬ 
tosystem. We provide three variants of a message recovery attack. We also esti¬ 
mate very precisely the complexity of the Level 3 Attack and we show that it is 
polynomial in all the parameters of the system. Furthermore, it is quasi-linear 
in the size of the secret key, whereas the decryption algorithm proposed in is 
cubic. 

Experimental results confirm the theoretical analysis. We show that the attack 
can easily break ASC with recommended parameters. The best choice to try to 
secure ASC against the attack is to take p and d as small as possible (p = 2 
and d = 50) and increase w. However our implementation is polynomial in w 
and can break the system in few hours, even when w = 75 (this value should be 
compared to the initial recommended w = 5). 

Thereby, we consider that the system is fully broken, but we believe that the 
section finding problem is still an interesting problem; in this paper, we have 
simply shown how to avoid to solve it in the context of ASC. 

Acknowledgements. We wish to thank the anonymous referees for their help¬ 
ful comments and suggestions. We are also thankful to Maki Iwami for useful 
discussions. 
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A Grobner Bases and Normal Form 

In this section, we shortly describe some fundamental tools from commutative 
algebra, which are useful for the attack presented in this paper. For a more 
complete presentation of those tools, the reader can refer to jbllj . 

From now on, K is a field and R denotes the ring K[xi,..., x n \. We suppose 
given an admissible monomial ordering <: for the attack we consider the grevlex 
(graded reverse lexicographical) ordering. 

Definition 1 (Grevlex ordering). The grevlex ordering is defined as follows. 
Let mi = x ... x“ n , m 2 = xf 1 ... x^" be two monomials. Then m\ > m 2 if 

- EJ=i a i > EJLi A ° r 

— ET=i ai = £ i= i A an d rightmost nonzero entry of (ai — f3 ±,..., a n — /3 n ) 
is negative. 

For any polynomial P G R, LA1(P) denotes its leading monomial with respect 
to <. For any ideal I C R, LAI(I) denotes the ideal generated by ({ LM(P) : 
P e I}). 

Definition 2 (Normal form). Let I be an ideal of R, and f G R be a polyno¬ 
mial. Then there exist unique polynomials h, g G R such that h is monic, g G I, 
f = h + g and no monomial of h is in LAI{I). Then h is called the normal form 
of f with respect to I and <, and is noted NFj{f). 
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The normal form is a K-linear application and its main property is: 

Proposition 1. Let I be an ideal of R, and f £ R be a polynomial. Then f £ I 
if and only if NFi(f) = 0. 

To be able to compute the normal form, we need another fundamental tool: 
Grobner bases. 

Definition 3 (Grobner basis). Let I be an ideal of R. A finite subset of poly¬ 
nomials G C I is called a Grobner basis of I (with respect to the monomial 
ordering <) if ( LM(G)) = LM(I). 


B Magma Code for the Level 1 Attack 

In the following piece of code, p and d are the parameters of the system. deg_t is 
the degree of m with respect to t and Lambdajn denotes the support of m (these 
values are public). FO and FI are the ciphertext, and X is the public surface. 


R<x,y,t>:=PolynomialRing(GF(p),3,"grevlex"); 

Res:=Resultant(R!(F0-F1),R!X,x); // Eliminate x 

F:=Factorization(Res); // Factor the resultant 
// Pick the irreducible factor of highest degree in y 
maxdeg:=Max([Degree(R!f[1],R!y) : f in F]); 
exists(QO){f [1]:f in F| Degree(R!f[1],R!y) eqmaxdeg}; 
J:=Ideal([R!QO,R!X,R!FO,R!FI]); 

Groebner(J); // Compute the Grobner basis of J 
Coeffm:=PolynomialRing(GF(p),#Lambda_m*(deg_t+l)); 

R2<x,y,t>:=PolynomialRing(Coeffm,3); 

// Construct the linear system 
plaintext:=&+[Coeffm.((i—1)*(deg_t+l)+j)* 

R2!NormalForm(R!x~Lambda_m[i][1]* 

R!y~Lambda_m[i][2]*R!t~(j-1),J) : 
i in [1..#Lambda_m], j in [1..deg_t+l]]; 
// Solve the linear system: 

V:=Variety(Ideal(Coefficients(plaintext))) ; _ 
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Abstract. We present an elementary method to construct optimized 
lattices that are used for finding small roots of polynomial equations. 

Former methods first construct some large lattice in a generic way from a 
polynomial / and then optimize via finding suitable smaller dimensional 
sublattices. In contrast, our method focuses on optimizing / first which 
then directly leads to an optimized small dimensional lattice. 

Using our method, we construct the first elementary proof of the 
Boneh-Durfee attack for small RSA secret exponents with d < TV 0,292 . 
Moreover, we identify a sublattice structure behind the Jochemsz-May 
attack for small CRT-RSA exponents d p ,d q < JV 0 073 . Unfortunately, in 
contrast to the Boneh-Durfee attack, for the Jochemsz-May attack the 
sublattice does not help to improve the bound asymptotically. Instead, 
we are able to attack much larger values of d p ,d q in practice by LLL 
reducing smaller dimensional lattices. 

Keywords: linearization, lattices, small roots, small secret exponent, 

RSA, CRT-RSA. 

1 Introduction 

The RSA cryptosystem is currently the most widely deployed cryptosystem. 
To perform a decryption or signature generation, an element x £ Zjy is raised 
to the d-th power, where d £ ZY N ) is the secret key. In order to speed up 
this process, one might be tempted to use a small value of d. However, once 
d < Ii, Wiener |Wie90] showed using a continued fraction approach that d can 
be reconstructed from just the public parameters e and N in polynomial time. 
This result has been further improved by Boneh and Durfee to d < TV 0 292 using 
a lattice based technique jBD99j . 

Another possibility to speed up the decryption and signature generation has 
been proposed by Quisquater and Couvreur |QC82| . They make use of the knowl¬ 
edge of the prime factorization of N = pq to compute x d modulo p and modulo q 

* This research was supported by the German Research Foundation (DFG) as part 
of the project MA 2536/3-1 and by the European Commission through the ICT 
programme under contract ICT-2007-216676 ECRYPT II. 


P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 534-69 J 2010. 
(c) International Association for Cryptologic Research 2010 - 









54 


M. Herrmann and A. May 


and finally combine the result using the Chinese Remainder Theorem. The run¬ 
ning time of this process is approx. 4 times faster than a standard decryption. To 
further lower the number of required operations, one can additionally use small 
CRT exponents, i.e. one can choose d such that d p = d mod p and d q = d mod q 
are both small. 

At Crypto ’07, Jochemsz and May [JM07] proposed the first polynomial time 
attack on CRT exponents that are smaller than TV 0 - 073 . However, the experi¬ 
mental results of Jochemsz and May for small dimensional lattices are much 
better than theoretically predicted. For example, using a lattice dimension of 
56, theoretically the attack should not work at all, while in practice this lattice 
dimension is sufficient to reconstruct private keys up to a size of AT 0 ' 01 . Such a 
discrepancy between theoretically predicted and practically achieved results is 
a strong indication that the involved lattice structure is not optimal. This led 
Jochemsz and May to conjecture that an analysis of sublattice structures could 
lead to a theoretically superior bound. 

In this paper we propose a method that can be applied to attack small CRT- 
exponents. Our new approach leads to smaller dimensional lattices than in the 
Jochemsz-May attack and fully explains the gap between the practical results 
of Jochemsz and May and their theoretical analysis. Unfortunately, our analysis 
shows that our smaller dimensional lattices asymptotically lead to the same 
bound N°- 0t3 as in | JM07] . thereby answering the conjecture of Jochemsz and 
May that sublattices improve the bound in the negative. 

Although we do not achieve an asymptotic improvement, our new approach 
enables us to attack much larger values of d p , d q in practice, compared to pM07) . 
by using smaller dimensional lattices. We implemented our algorithm and showed 
that e.g. for a 2000-bit N we can efficiently recover 47-bit d p , d q , whereas the 
technique of |JM07] only allows to recover about 35-bit d p ,d q in a comparable 
amount of time. 

Our method is lattice-based and uses the technique of unravelled linearization 
introduced by Herrmann and May at Asiacrypt ’09 |HM09| . which can be seen as 
a hybrid method between usual linearization and Coppersmith’s method l^97| - 
The central idea of unravelled linearization is to perform as a first step a lineariza¬ 
tion on the initial polynomial and keep the induced relations of the linearization 
in mind. These relations are afterwards used in a second step where we back- 
substitute in order to eliminate some monomials, thereby partially unravelling 
the first linearization step. In order to explicitly compute the induced relations, 
we propose to use a Grobner basis computation. 

We illustrate the technique of unravelled linearization by showing the first 
elementary proof of the Boneh-Durfee bound d < TV 0 292 for small secret RSA 
exponents. Optimization of bounds is in our framework a simple task. There¬ 
fore, we conjecture that the Boneh-Durfee bound cannot be improved unless a 
different polynomial equation is used. 

The rest of the paper is organized as follows: In Section 2 we will review some 
basic results from lattice theory. Section 3 will describe the method of unravelled 
linearization for the case of small RSA exponents d with a proof of d < N 0292 . 
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We will then apply our method to attack small CRT exponents in Section 4, 
where we achieve the Jochemsz-May bound of N 0 073 with smaller dimensional 
lattices. In Section 5, we demonstrate that our improved lattices allow for much 
better practical results in attacking small CRT-exponents. 

2 Basics 

Before we explain the details of unravelled linearization and how to use it to 
improve the analysis of small CRT-exponents, we want to give some necessary 
background information on lattice theory and the lattice-based method of Cop¬ 
persmith |Cop97| . 

A lattice is a discrete additive subgroup of M n . That is, for a set of linearly 
independent basis vectors bi,, bdim £ R", dim < n, the set 

{ dim 

x £ R" | x = ciibi with oq £ Z 
*=o 

is called a lattice. One can describe a lattice by its basis matrix B , where we 
write the vectors bi as row vectors. 

Let I be a lattice with basis bi,..., bdim , and let b{,..., b* Mrn be the result 
of applying Gram-Schmidt orthogonalization to the basis vectors. Then the de¬ 
terminant of L is defined as det(L) = JJ* 7 ? || 6 *||. For a lattice of full rank, i.e. 
dim = n , the determinant of a lattice equals the absolute value of the determi¬ 
nant of a lattice basis matrix. 

Lattices have proved to be very useful in cryptanalysis mostly because of a 
powerful and efficient lattice reduction algorithm due to Lenstra, Lenstra and 
Lovasz [ILLL82] . This so-called LLL algorithm outputs an approximation of a 
shortest lattice vector in time polynomial in the bit-length of the entries of the 
basis matrix and in the dimension of the lattice dim. Using the LLL algorithm as 
a building block, Coppersmith |Cop96aj |Cop96b| designed a rigorous algorithm 
that allows to efficiently compute small roots of bivariate polynomials over the 
integers or univariate modular polynomials. Additionally, he gave a heuristic 
extension to multivariate polynomials. 

Coppersmith’s idea is to construct, on input some polynomial /, a set of 
coprime polynomials which contain the same roots over the integers. Then one 
can use standard elimination and root finding techniques to extract these roots. 
Howgrave-Graham [HG97| gave a simple reformulation of Coppersmith’s method 
that defines the following condition. 

Theorem 1 (Howgrave-Graham). Let g(xi,...,Xk) be a polynomial in k 
variables with n monomials. Furthermore, let m be a positive integer. Suppose 
that 

1 . g(r \,..., rfc) = 0 mod b m , where |rj| < A,;, i = 1 ,..., k and 
2. \\g(x 1 X 1 ,...,x k X k )\\ < 

where the norm of g is defined as the Euclidean norm of its coefficient vector. 
Then g{r \,..., r*,) = 0 holds over the integers. 
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3 Unravelled Linearization and the Boneh-Durfee Attack 

In this section, we will apply the method of unravelled linearization, introduced 
by Herrmann and May |HM09| . to attack RSA with small secret exponent d. 
This will lead to an elementary proof of the Boneh-Durfee bound d < TV 0 - 292 . 

In 1999, Boneh and Durfee |BD99| showed with a lattice-based Coppersmith- 
type attack, that private RSA keys smaller than can be recovered in 

polynomial time. The attack’s running time is dominated by LLL-reducing some 
large dimensional lattice basis B , whose dimension depends on It turns out 
that the associated lattice L(B) contains a smaller dimensional sublattice L' 
that allows to show an improved bound of jV°- 292 ~ e . 

The identification and analysis of this sublattice L\ however, is a complicated 
task due to the fact that its lattice basis is no longer triangular and, there¬ 
fore, the computation of the lattice determinant det(Z/) is much more involved. 
Boneh and Durfee developed for the analysis of det(iy) a notion called geometri¬ 
cally progressive matrices that allowed for handling these non-triangular lattice 
bases. Blomer and May [BM01] followed a different approach and showed that 
asymptotically it does not influence the determinant if some specific columns 
are removed. This allowed them to rebuild some triangular structure of the basis 
matrix. Both approaches are, however, quite complex methods for optimizing 
lattice bases. 

As opposed to the methods of [BD99] and |BM01] our new approach will 
not manipulate a basis matrix but rather it will manipulate the underlying 
polynomial from which a basis matrix is derived. This will directly lead to a 
low-dimensional sublattice with a basis of triangular structure that allows for an 
easy determinant calculation. 

The method of our choice for this task is the technique of unravelled lin¬ 
earization |HM09| . However, before we introduce our method we briefly re¬ 
call the original Boneh-Durfee attack in order to illustrate the similarities and 
differences. 

The polynomial to be analyzed is derived from the RSA key equation ed = 
1 mod (f>(N ). Rewrite this as 

ed = 1 + X(j>{N) 

Oed= 1 + x( N + 1 + (— p — q )) 

A v 

and search for small modular roots of the polynomial 

f(x , y) := 1 + x{A + y) mod e. 

Therefore, we fix an integer m and define the polynomials 

g t ,k{x,y) := x l f k e m ~ k and h jtk {x,y) := y 3 f k e m ~ k . 

A lattice basis is constructed by using the coefficient vectors of the so-called 
x-shifts gi^{xX 7 yY) for k = 0 ,..., m and i = 0,..., m — k as basis vectors. 
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i 2 2 2 2 2 2 3 

lx xy x x y x y y xy x y 



The values X and Y denote upper bounds on the sizes of the solutions. Addition¬ 
ally, we use the so-called y-shifts hj t k(xX, yY) for k = 0,..., m and j = 1,..., t, 
where t is some parameter that has to be optimized. Figure [T] shows an example 
for the parameters to = 2 and t = 1. Note that the coefficient vectors of the shift 
polynomials gi t k(xX,yY) and hj t k(xX,yY) are written as row vectors. 

Boneh and Durfee’s improved analysis showed that one obtains superior values 
for X and Y, if one takes only a subset of the y-shifts. For our example this means 
we exclude ye 2 and yfe. Hence, the resulting lattice basis is no longer triangular 
and, therefore, deriving a closed determinant formula for general to and t is a 
complex task. 

We now use the technique of unravelled linearization to construct a lattice 
basis which yields the best known asymptotic bound TV 0 292 and yet retains a 
triangular lattice basis. 

The first step in the process is to perform a suitable linearization of the original 
polynomial. In our case, we glue together the monomials in the following way 

1 + xy +Ax mod e. 



This leaves us with the linear polynomial /(u,x) = u + Ax and additionally 
a relation xy — u — 1 derived from the substitution. Although Coppersmith’s 
method is a construction method suited for polynomial equations and does not 
give improved bounds in the case of linear equations, we now construct a lattice 
basis using exactly the same x-shifts as in the original Boneh-Durfee attack. I.e., 
we construct polynomials 

gi,k(u, x) := x‘ l j k e m ~ k for k = 0 ,..., m and i = 0 ,..., m — k, ( 1 ) 

and use their coefficient vectors as basis vectors. One can show that this leads 
to the Wiener bound of TV 0 25 . 

However, if we also include y-shifts of the form hj^{u, x, y) := yi f k e m ~ k , then 
we obtain a benefit. This may sound strange at first glance since the monomial y 
is not even present in our new polynomial f(u,x). The reason for the improved 
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Fig. 2 . Boneh-Durfee lattice for m = 2,t = 1 using unravelled linearization 


bound becomes clear, when we incorporate the induced relation xy = u — 1 and 
use it to substitute each occurrence of xy by the term u — 1 . 

The advantage can be seen by comparing the shift yf 2 from the original 
analysis with the new shift yf 2 . As noted previously, the improved analysis 
uses only the shift yf 2 and neither yfe nor ye 2 . But yf 2 introduces three new 
monomials y , xy 2 and x 2 y 3 in the Boneh-Durfee lattice basis - thereby destroying 
the triangular structure. 

Let us compare this with our new unravelled linearization approach, which 
we depicted in Figure [2] for the same parameters m = 2 and t = 1. The shift 
yf 2 introduces the monomials x 2 y,uxy and u 2 y. We replace each occurrence 
of xy by u — 1, i.e., we replace x 2 y by ux — x and uxy by u 2 — u. But the 
monomials ux,x,u 2 and u are already present in the lattice bases. Thus, the 
only new monomial that comes from the shift yf 2 is u 2 y , thereby retaining the 
triangular structure. 

In order to keep the triangular structure in general, we look at an arbitrary 
shift y l f g . Notice that for the ease of notation we will omit the factor e m ~ e as it 
does not influence the set of monomials. Since / = u+ Ax we can expand y l f e 
by the binomial theorem 

«V + ^ Au e ~ 1 xy l + ... + A e x e y\ 

The first term introduces a new monomial vfy 1 . However, we will now derive a 
certain restriction under which all other monomials are already present in the 
lattice basis. Let us therefore look at the monomials of the second term after the 
substitution of xy 

= u e -\ u -1 y - 1 = u y - 1 - u i - y- 1 . 

The monomials vfy l ~ x and u^ 1 y l ~ 1 appear in y i_1 / £ and respectively. 

In general, the (j + l) th term of the binomial expansion contains monomials that 
appear in y l ~ 2 f^~ k for k = 0, ... ,j. 

Therefore, the shift y 2 f e introduces exactly one new monomial u e y l if all shifts 
yi-j Ji-k f or j = 1,..., i — 1 and k = 0, .. .j were used in the construction of the 





Maximizing Small Root Bounds by Linearization 


59 


lattice basis. This is exactly to the restriction that was called increasing pattern 
in [BMOIj . 

Since the y-shifts hj t k in the original Boneh-Durfee attack satisfy this increas¬ 
ing pattern restriction as shown in |BM01| . we take in our analysis the y-shifts 
hj t k for the same set of indices (j, k) as in [BD99 . I.e., we define the y-shifts 

h jt k = y J f k e m ~ k for j = 1 ,..., t and k = |^yj ( 2 ) 

We show that this set of y-shifts h^k satisfies our requirement, i.e. we show that 
if y l f £ is a y-shift, then all of y 1- -' f e ~ k for j = 1 ,..., i — 1 and k = 0, ... ,j are 
also used as shifts. Notice that it is sufficient to show y*W f e ~J is used as a shift. 

Since y l f e is in the set of y-shifts, we know that t £ {\Jj-\i, ■ ■ ■ ,in} and 
therefore t — j £ {L^yJ i — j, ■ • ■ ,tti — j}. For y l ~on the other hand, we have 
£ — j £ { [f J (i — j),, in }. Our requirement is thus fulfilled if the condition 
| m I . . | m | 

LtJ 

holds. We can rewrite this as [yj > 1, which holds if m > t. 

Given the set of shift polynomials, we proceed with the computation of the 
determinant. For the following asymptotic analysis we let t = rm. Further, 
for the optimization we omit roundings as their contribution is negligible for 
sufficiently large to. 

We are able to directly compute the contributions of the shift polynomials 
from dT|) and @ . Here, we denote by s x the contribution of X to the determinant. 


Sx 


s 


v 


s u 


s e 


dim(L) 


£ Y, i = \ m3 +°( m3 ) 

k =0 i=0 

£ £ j * yTO 3 +o(m 3 ) 

.7 — 1 k.= — i 


J2 fc = i 6 + 3 ) m3+o(r ” 3) 

k =0 i =0 j=l k=—j ^ 

££(m-*) + E £ ( m - fc ) = (^ + £ m3 + °( m3 ) 


EE 1 + E E 1 = 

k =0 i=0 j =1 k——j 


1 T 

— + — 


2 , / 2 \ 
m + o(m ) 


Using these values together with the upper bounds X = N s , Y = Ni ,U = N s+ a 
on the variables in the usual enabling condition det(L) = X Sx Y Sy U Su e Se < 
6 m dim(L) ^ we oVjta.iri an optimized value of r = (1 — 26) and finally derive the 
desired Boneh-Durfee bounc0 

<5 < - ^2 - V2j « 0 . 292 . 

1 The given bound is for full size e, i.e. we set e ~ N. 
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Notice that our choice of r fulfills our previous restriction m > t. To summarize, 
the method of unravelled linearization provides a simple and elegant way to 
capture the sublattice structure in the Boneh-Durfee attack. In the following 
section, we will use the same method to recover the hidden sublattice structure 
in the Jochemsz-May attack on small CRT-RSA exponents. This sublattice was 
previously unknown and was conjectured to be the key for improving the CRT- 
RSA attack bound. 

4 CRT Exponents 

The task of attacking small CRT exponents was first mentioned as an open 
problem in Wiener [Wie90] . At PKC ’06, Bleichenbacher and May [BM06j gave 
an attack that worked in the case where e is significantly smaller than N. They 
started with the CRT-RSA equations ed p = 1 + k(p — 1) and ed q = 1 + l(q — 1), 
and derived a single polynomial in the unknowns ( d p , d q ,k,l) by setting q = A- 
and eliminating p: 

e 2 d p d q — e(d p + d q ) + e(d q k + d p l) — (k + l — 1) — (N — 1 )kl = 0. (3) 

This equation can be linearized to 

e 2 xi + ex 2 — {N — l)x^ — X 4 = 0 (4) 


with unknowns 


X\ = d p d q , X 2 = d q k + dpi — d p — d q , X 3 = kl, £4 = (k + l — 1). 

For d p , d q < N 5 we get k. I < N? +s and Eq. dU directly leads to a lattice attack 
provided that S < min{^, ~ — §«}, where a = log^y e. However, for a full size e, 
i.e. a — 1, this attack does not work. 

In 2007, Jochemsz and May |.IM07j improved the analysis by exploiting the full 
algebraic structure of Eq. ([3l) with a Coppersmith-type attack. For the case a = 
1, they showed that it is possible to find small solutions if <5 < 0.073. However, 
in their experiments they noticed a big gap between the theoretically predicted 
bound and the experimentally observed bound. Namely, the experiments were far 
better than theoretically expected indicating the possibility of a better bound. 

E.g., using their analysis, a lattice dimension of 56 should not suffice for 
attacking small CRT-exponents, while practically it allows for solving up to 
d p , d q < TV 0 01 . Jochemsz and May reported that the smallest LLL vectors came 
from a sublattice and conjectured that identifying the sublattice structure would 
improve the bound - analogous to the case of the Boneh-Durfee attack where 
the sublattice lifts the bound from TV 0 - 284 to jV 0 - 292 . 

In this section, we show that this conjecture is false. By using the method 
of unravelled linearization, we will capture the sublattice structure behind the 
Jochemsz-May attack. This will completely explain the experimental behavior 
in | 'JM07; | and therefore close the gap between practice and theoretical analysis. 
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As a result, we construct lattices of much smaller dimension than in |JM07j . 
whose theoretical analysis exactly matches the experiments that we present in 
the subsequent section. 

Very disappointingly from a cryptanalytic point of view, the size of the CRT- 
exponents d p , d q that we are able to attack in polynomial time converges for 
growing lattice dimension to the same bound TV 0 073 as in [JM071 . Thus, asymp¬ 
totically we are unable to improve on the bound although we fully exploit the 
sublattice structure. Nevertheless, we think that our method is of independent 
interest and will prove to be useful for other attacks since it is simple and leads 
to an easy analysis. 

Let us describe the attack in detail. Starting point is the polynomial equa¬ 
tion 0). We proceed similar to [BM06] and perform an (almost) identical 
linearization. 


e 2 d v d a — e (d v + d a ) +e (d a k + d v l) — (k + Z—1) — (N — 1) kl =0 (5) 



y 


We now use the method of unravelled linearization with the linear polynomial 
/ = e 2 u — ev + ew — x — Ay + 1, where A = N — 1. The next step is to build 
up a lattice following the extended strategy from [.TM06] . This means we use the 
monomials of / m ~ 1 as shifts and furthermore include extrashifts in the variables 
u and v up to some parameter t which has to be optimized later. 

The benefit in unravelled linearization comes from the fact that the variables 
u, v, w , x, y are related. Namely, we have 


vwx — (d p + d q )(dpl + d q k)(k + l ) 

= d 2 kl d 2 l 2 dpd q (k -t- l ) 2 -t- d(qk 2 -t- d 2 kl 

= (d 2 p + d 2 q )kl + (d 2 p l 2 + d 2 q k 2 ) + d p dq (k + if 

= ((dp dq)~ — 2 d p dq)kl T ((d p l -|- dqk^)“ — 2 dpdqkVj -t- d p d q (k -t- V) 2 


(v 2 — 2 u)y + w 2 — 2 uy + ux 2 . 


( 6 ) 


This non-obvious relation can be computed easily using a Grdbner basis compu¬ 
tation. Recall the equations given by the linearization. These are 5 linearization 
equations in 9 unknowns, so we can eliminate via Grobner basis computation the 
four variables d p ,d q ,k,l and obtain Eq. 0 in the unknowns u,v,w,x,y only. 
This equation now serves in the back-substitution step of unravelled lineariza¬ 
tion, where we replace each occurrence of vwx by the monomials v 2 y,uy,w 2 


and ux 2 . 


To exemplify our method, we use the parameters m = 2 and t = 1. This is the 
smallest choice where Jochemsz and May |JM07j found positive experimental 
results. In the framework of unravelled linearization, it is obvious why we do not 
obtain a positive result for smaller parameters. In order to improve upon the 
bound from Bleichenbacher, May |BM06] . we have to use relation 0. However, 
the lattice parameters m = 2 and t = 1 are the smallest ones for which the 
monomial vwx appears. 
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A lattice basis B for (m,t) = (2,1) is given in Figure |3] We use here the 
notation from the original Coppersmith method over the integers - as opposed 
to the modular approach taken in Section [3l That is, we construct a lattice basis 
with the coefficient vectors of the shift polynomials as column vectors (refer 
to |Cop97| for details). For simplicity we omit the left hand side of the basis 
matrix, which contains just the inverses of the corresponding upper bounds of 
the monomials on its diagonal. The entries that come from the substitution are 
printed in bold letters. 

For the lattice attack to work, we require the enabling condition det(L) > 1 
(see |Cop97| ). In our example, computation of the determinant of the basis 
matrix yields 


/ 


( e 2 
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1 
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Fig. 3. Matrix of unravelled linearized polynomial for m = 2, t = 1 
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det(.B) = U- 21 V- 20 W~ u X- 14 A 15 . 

We have upper bounds (U,V,W,X) = (N 2S ,N S , N^ +2S , N^+ 6 ) for the un¬ 
knowns ( dpd q ,dp + dq 1 dqk + d p l,k + /), respectively. Thus, with A ss N the 
enabling condition det(L) > 1 reduces to 5 < « 0.01. This perfectly matches 

the experimental results of Jochemsz and May for parameters (m,t) = (2,1). 

We now proceed to the asymptotic analysis and start by analyzing the simpler 
case without any extrashifts. I.e., we shift in the monomials of / m_1 only, but 
we have to exclude all monomials that are divisible by vwx, since these can be 
written as the linear combination from Eq. (|5|) . 

To compute the value of the determinant we begin by counting the number of 
shift polynomials as each one contributes with a factor of A to the determinant. 
The number of shift polynomials equals the number of monomials in the set 

5 

u ei v e2 w e3 x ei y e5 | ej £ No,^e.; < m — 1 , = 0 or = 0 or = 0 

»=l 

Their number can be computed as 
5 

(ei,..., e 5 ) £ Np | ej < m — 1 

i= 1 

( ei ,..., e 5 ) £ Nq | E et < m — 1 , 62 ; 63,64 > 1 

i =1 

Let us derive the size of the first set by counting. Write ei + e 2 + e 3 + e 4 -|-e 5 -|-/i = 
to — 1 for some slack variable h £ {0 ,... ,m — l}to transform the inequality into 
an equality. If we set e( = ei + 1 and h! = h + 1 then the number of tuples that 
fulfill the equation 

e i + e 2 + e 3 + e 4 + e 5 + h' = (to — 1 ) + 6 with e(, h' i > 1 

is exactly the number of ordered partitions of to + 5 in 6 partitions. Let us write 
to+ 5 = 1 + 1 + .. . + 1, then one obtains an ordered 6-partition of to + 5 by 
choosing 5 out of the to+ 4 signs as breakpoints for the partition. We have (” 1 ^ 4 ) 
possibilities for this choice. 

The size of the second set is derived in a similar fashion, where we require 
e'i + e 2 + e 3 + e 4 + e' 5 + h' = m + 2. In this case, the number of tuples is • 
Summing up, we obtain for the number of shifts 

#shifts = ^ = “To 4 + o(m 4 ). 

The second part contributing to the determinant comes from the monomials 
that occur in the lattice basis. This is the product of the diagonal entries in the 
submatrix on the left that has been omitted in Figure [31 As mentioned before, 
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the diagonal entries consist of the inverses of the upper bound of the monomial 
corresponding to that row. The explicit computation is given in Appendix |Aj 
while we only state the results here. 

* U = ^ 2 0 + 3 4 2 ) = + 

#v = #w = #x= +2 ( m ^ 2 ) = y^w 4 + o(to 4 ). 

Recall that the enabling condition for the lattice attack is det(L) > 1. With 
the previously derived values and neglecting low order terms as well as setting 
A = N, we are able to write the determinant as 

det(L) = U~^ mi V~^ m4 W~^ mi X~^ m4 Ni mi . 

If we use the upper bounds (U, V , W, A) = (N 25 , N s , Ni +2S , N? +5 ) on the sizes 
of the variables, we derive the condition 


This is the same asymptotic bound that was obtained by Jochemsz and 
May |.IM07j without extrashifts. So, unfortunately, our new lattice does not im¬ 
prove the asymptotic bound of [JM07j . But, as opposed to |JM07| . our approach 
requires smaller lattice dimensions. Asymptotically, |JM07) need to LLL-reduce 
a lattice of size to 3 , while our approach requires only lattice dimension ^m 3 . 
Figure |U shows a comparison of the two methods in terms of the size of d p ,d q 
that can be attacked. 

While our approach clearly allows for attacking larger values of CRT- 
exponents in practice, we would also like to stress the fact that as opposed 



Fig. 4. Comparison of the achievable bound depending on the lattice dimension 
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to |.IM07) the experimental behavior of our attack can be completely explained 
by our theoretical analysis - thereby also explaining the experimental behavior 
of |.TM07j . We will show this in the subsequent section. 

If we also use so-called extrashifts then we end up with a slightly improved 
bound of d p , d q < jV 0 073 as in [JM07j . The analysis can be done in a similar fash¬ 
ion to the case without extrashifts. We carry out the calculations in Appendix iBl 

5 Experiments 

The reason for carrying out various experiments for attacking CRT-RSA is 
twofold. First, we want to show that our analysis from Section 0] is indeed op¬ 
timal. That is, the experimental behavior can be perfectly predicted by the 
analysis and there is no hope to improve the bound by this approach. Second, as 
our lattice-based approach is heuristic, we have to verify that the polynomials 
that we obtain after the lattice reduction are indeed coprime and thus allow for 
efficient recovery of their roots. 


Table 1. Experimental Results 


N 

dp 5 dq 

6 

lattice parameters 

dim JM 

LLL-time JM 

LLL-time(s) 

1000 bit 

11 bit 

0.0096 

m = 2, t = 1, dim = 30 

56 

14 

2 

1000 bit 

18 bit 

0.0178 

m = 3, t = 1, dim = 60 

115 

6100 

258 

1000 bit 

22 bit 

0.0226 

m = 3, t = 2, dim = 93 

- 

- 

3393 

1000 bit 

24 bit 

0.0244 

lO 

o 

II 

1 

II 

A" 

II 

S 

- 

- 

7572 

1000 bit 

29 bit 

0.0291 

to = 4, t = 2, dim = 154 

- 

- 

61298 

2000 bit 

21 bit 

0.0096 

to = 2, t = 1, dim = 30 

56 

40 

4 

2000 bit 

35 bit 

0.0178 

to = 3, t = 1, dim = 60 

115 

20700 

613 

2000 bit 

45 bit 

0.0226 

to = 3, t = 2, dim = 93 

- 

- 

13516 

2000 bit 

47 bit 

0.0244 

to = 4, t = 1, dim = 105 

- 

- 

34305 

5000 bit 

48 bit 

0.0096 

to = 2, t = 1, dim = 30 

56 

379 

39 

5000 bit 

89 bit 

0.0178 

to = 3, t = 1, dim = 60 

- 

- 

5783 

5000 bit 

113 bit 

0.0226 

to = 3, t = 2, dim = 93 

- 

- 

74417 

10000 bit 

96 bit 

0.0096 

to = 2, t = 1, dim = 30 

56 

2500 

360 

10000 bit 

179 bit 

0.0178 

to = 3, t = 1, dim = 60 

- 

- 

31226 


We reimplemented the attack of [JM07) and used in the experiments the same 
modulus sizes and lattice parameters as done in pM07] . Table 0] clearly shows 
the speedup for the LLL reduction. For example with parameters m = 3 and 
t = 1 our method is 20 to 30 times faster than the one of Jochemsz and May. 
As previously mentioned, this is due to the reduced lattice dimensior0. While 
Jochemsz and May required the reduction of a lattice of dimension 115, our 
lattice only has dimension 60. Because of this smaller lattice dimension we were 


The lattice we are considering here is the one that serves as input to the LLL reduc¬ 
tion routine. That is the sublattice containing zeros in the coordinates corresponding 
to the shift polynomials. 
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able to perform experiments on parameter sets that have been out of reach 
before. 

Notice that the experimental results on the achievable sizes of d p and d q 
perfectly match the theoretically predicted bound <5. This is a strong indication 
that our approach is indeed optimal. 

We ran our experiments using sage 4.1.1. and used the L 2 reduction algorithm 
from Nguyen and Stehle [ NS09_ . The calculations were performed on an Quad 
Core Intel Xeon processor running at 2.66 GHz. 

References 

[BD99] Boneh, D., Durfee, G.: Cryptanalysis of RSA with Private Key d Less than 
A 0 ' 292 . In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 1 11. 
Springer, Heidelberg (1999) 

[BM01] Blomer, J., May, A.: Low Secret Exponent RSA Revisited. In: Silverman, J.H. 

(ed.) CaLC 2001. LNCS, vol. 2146, pp. 4-19. Springer, Heidelberg (2001) 
[BM06] Bleichenbacher, D., May, A.: New Attacks on RSA with Small Secret 
CRT-Exponents. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T.G. (eds.) 
PKC 2006. LNCS, vol. 3958, pp. 1-13. Springer, Heidelberg (2006) 

[Cop96a] Coppersmith, D.: Finding a Small Root of a Bivariate Integer Equation; 

Factoring with High Bits Known. In: Maurer [Mau96], pp. 178-189 (1996) 
[Cop96b] Coppersmith, D.: Finding a Small Root of a Univariate Modular Equation. 
In: Maurer [Mau96], pp. 155-165 (1996) 

[Cop97] Coppersmith, D.: Small Solutions to Polynomial Equations, and Low Expo¬ 
nent RSA Vulnerabilities. J. Cryptology 10(4), 233-260 (1997) 

[HG97] Howgrave-Graham, N.: Finding Small Roots of Univariate Modular Equa¬ 
tions Revisited. In: Darnell, M.J. (ed.) Cryptography and Coding 1997. 
LNCS, vol. 1355, pp. 131 142. Springer, Heidelberg (1997) 

[HM09] Herrmann, M., May, A.: Attacking Power Generators Using Unravelled 
Linearization: When Do We Output Too Much? In: Matsui, M. (ed.) 
ASIACRYPT 2009. LNCS, vol. 5912, pp. 487-504. Springer, Heidelberg 
(2009) 

[JM06] Jochemsz, E., May, A.: A Strategy for Finding Roots of Multivariate Poly¬ 
nomials with New Applications in Attacking RSA Variants. In: Lai, X., 
Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 267-282. Springer, 
Heidelberg (2006) 

[JM07] Jochemsz, E., May, A.: A Polynomial Time Attack on RSA with Private 
CRT-Exponents Smaller Than N °' 073 . In: Menezes, A. (ed.) CRYPTO 
2007. LNCS, vol. 4622, pp. 395-411. Springer, Heidelberg (2007) 

[LLL82] Lenstra, A.K., Lenstra, H.W., Lovasz, L.: Factoring Polynomials with Ra¬ 
tional Coefficients. Mathematische Annalen 261(4), 515-534 (1982) 

[Mau96] Maurer, U.M. (ed.): EUROCRYPT 1996. LNCS, vol. 1070. Springer, 
Heidelberg (1996) 

[NS09] Nguyen, P.Q., Stehle, D.: An LLL Algorithm with Quadratic Complexity. 
SIAM J. Comput. 39(3), 874-903 (2009) 

[QC82] Quisquater, J.J., Couvreur, C.: Fast Decipherment Algorithm for RSA 
Public-key Cryptosystem. Electronics Letters 18, 905 (1982) 

[Wie90] Wiener, M.J.: Cryptanalysis of Short RSA Secret Exponents. IEEE Trans¬ 
actions on Information Theory 36(3), 553-558 (1990) 




Maximizing Small Root Bounds by Linearization 


67 


A Counting #w, +x 

The monomials that contribute to the determinant are exactly the monomials of 
f m that do not contain the variable y. Denote such a monomial by u ei v e2 w e3 x ei . 
In order to count the number of u’s that contribute to the determinant we 
proceed as follows. 

Let ei = 0. We have e-i + e 3 + e 4 < m with e* £ No, which transform into 
e' 2 + e' 3 + e' 4 + h' < to + 4 for a slack variable h! £ {1,..., m+ 1} and e' = e* +1. 
The number of such tuples is just the number of 4-partitions of m + 4, which is 
( m ^ ) ■ From these tuples we have to remove the ones with e* > 1 for i = 2,3,4, 
because of the substitutions of vwx. The number of these tuples is (™). For 
ei = 1, we proceed similarly and obtain ( m 3 ) — ( m ; 7 1 ) ■ We carry this out for 
all possibilities of e\ and end up with e\ = to, where we get ( 3 ) — (°). 

Now we know the number of occurences for each power u *, i = 0,...,ru¬ 
in order to count the total number of u we compute the weighted sum as 
follows. 


m+3 

#U = E ( m + 3 ~ 
2 = 3 
m+3 

= E ( m + 3 

i—m+1 






+ E((-+ 3 - 

2=3 



i) - - *)) ( 3 ) 

( m 3 +2 )-( m 3 +1 ) 



Using the identities (") - (V) = and £" =0 (j) = ()£}) we eventually 

obtain 


#u 


TO + 1 
2 


+ 3 


to + 2 
4 


Thus, = |to 4 + o(to 4 ). 

Counting the number of occurrences of v, w and x can be done in a similar 
way and we obtain 


#v = #w 


+ -J2{m. + 1- 


2 = 3 
m+1 


3 

to + 2 


= 2 


to + 2 
4 


= —m 4 °( m4 )- 



+ 


( m 3 +2 ) 
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B Improving the Bound Using Extrashifts 


In the following we will show that it is possible to improve the bound S < jg ~ 
0.0714 to 5 ~ 0.0734 by using so-called extrashifts. In this case, we use the set 
of shifts 

t t — t\ 

S = [J {u ei+tl v e2+t2 w e3 x e4 y es | u ei v e2 w e3 x ei y e5 is monomial of / m_1 }. 

t ± —0 ^2 = 0 


To estimate the number of shifts, one may use a combinatorial proof as in 
Section |4] and count the number of all monomials minus the monomials hav¬ 
ing e 2 , e 3 , e 4 > 1. However, we choose to use a computational approach here and 
simply evaluate a series of sums. 

The shift monomials can be characterized by the set Si\ S 2 , where Si is 
the set of all shifts and S 2 are the shifts that have to be removed due to the 
substitution of vwx. 


Setting t = 


w e3 x e4 y e5 € Si <t=> 


w e3 x e4 y e5 € S 2 


es = 0,..., to — 1 
e4 = 0,..., to — 1 
e3 = 0,..., to — 1 
e2 = 0,..., to — 1 
ei = 0,..., to — 1 

e5 = 0,..., to — 1 
e4 = 1,..., to — 1 
e3 = 1,..., to — 1 
e2 = 1,..., to — 1 
ei = 0,..., m — 1 


— e 5 

— es — e4 

— es — e4 — e3 + t 

— es — e4 — e3 — e2 

— e 5 

— es — e4 

— es — e4 — e3 + t 

— es — e4 — e3 — e2 


tto, the resulting number of shifts is 

I Si \ s 2 | = Q + ^ to 4 + °(to 4 ). 


t 


t 


In a similar fashion we derive the exponents of the variables u, v , w and x con¬ 
tributing to the determinant. For example, to calculate the number of occur¬ 
rences of u, we compute 


m m—e 4 m-e^—e^-\-t m—e^ — e^ — e2-\-t 

-» = £ £ £ £ * 

e4—0 e3—0 e2—0 e± —0 

m m—e 4 m-e^ — e^-\-t m—e 4—e3—e2+t 

-£ £ £ £ 

e4=l e3=l e 2 = l ei=0 
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For the other values we obtain 

( 1 t t 2 r 3 \ 4 4 

s ' = (i2 + 3 + Y + yJ m +o(ra) 

( 1 T T 2 \ 4 ,4, 

Sw= V12 + 3 + TJ TO + °^ m ’ 

( 1 T T 2 \ 4 ,4, 

Sx= V12 + 3 + TJ TO + °^ m 

We use these values together with the upper bounds (U,V,W,X) = ( N 25 ,N 5 , 
N 2 +2S , N? +s ) to compute the determinant of the lattice. After that, we are able 
to solve the enabling condition det(L) > 1 for <5 and optimize the value of r to 
maximize S. We obtain r ss 0.381788, which finally leads to the bound 


(5 < 0.0734142. 
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Abstract. We study the problem of integer factoring given implicit information 
of a special kind. The problem is as follows: let N\ = p\q\ and N 2 = P2Q2 be two 
RSA moduli of same bit-size, where q\,q2 are a-bit primes. We are given the im¬ 
plicit information that p\ and p2 share t most significant bits. We present a novel 
and rigorous lattice-based method that leads to the factorization of N\ and IVi in 
polynomial time as soon as t > 2a + 3. Subsequently, we heuristically generalize 
the method to k RSA moduli N, = piq, where the p{ s all share t most significant 
bits (MSBs) and obtain an improved bound on t that converges to t > a + 3.55... 
as k tends to infinity. We study also the case where the k factors p, ’s share t con¬ 
tiguous bits in the middle and find a bound that converges to 2a + 3 when k tends 
to infinity. This paper extends the work of May and Ritzenhofen in J§), where 
similar results were obtained when the p,’s share least significant bits (LSBs). In 
ED, Sai'kar and Maitra describe an alternative but heuristic method for only two 
RSA moduli, when the p,-’s share LSBs and/or MSBs, or bits in the middle. In 
the case of shared MSBs or bits in the middle and two RSA moduli, they get bet¬ 
ter experimental results in some cases, but we use much lower (at least 23 times 
lower) lattice dimensions and so we obtain a great speedup (at least 10 3 faster). 
Our results rely on the following surprisingly simple algebraic relation in which 
the shared MSBs of p\ and P 2 cancel out: giAF — < 72^1 = qiq2(P2 ~ Pi)- This 
relation allows us to build a lattice whose shortest vector yields the factorization 
of the Nj’s. 

Keywords: implicit factorization, lattices, RSA. 


1 Introduction 

Efficient factorization of large integers is one of the most fundamental problem of Al¬ 
gorithmic Number Theory, and has fascinated mathematicians for centuries. It has been 
particularly intensively studied over the past 35 years, all the more that efficient fac¬ 
torization leads immediately to an attack of the RSA Cryptosystem. In the 1970’s, the 
first general-purpose sub-exponential algorithm for factoring was developed by Morri¬ 
son and Brillhart in liTTll (improving a method described for the first time in 0), using 

P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010. LNCS 6056, pp. 7o |-87,| 2010. 

© International Association for Cryptologic Research 2010 


Implicit Factoring with Shared Most Significant and Middle Bits 


71 


continued fraction techniques. Several faster general-purpose algorithms have been pro¬ 
posed over the past years, the most recent and efficient being the general number field 
sieve (GNFS) ®, proposed in 1993. It is not known whether factoring integers can be 
done in polynomial time on a classical Turing machine. On quantum machines, Shor’s 
algorithm |[T6l allows polynomial-time factoring of integers. However, it is still an open 
question whether a capable-enough quantum computer can be built. 

At the same time, the problem of factoring integers given additional information 
about their factors has been studied since 1985. In am Rivest and Shamir showed 
that N = pq of bit-size n and with balanced factors (logT(p) » log 2 (q) ~ §) can be 
factored in polynomial time as soon as we have access to an oracle that returns the 
| most significant bits (MSBs) of p. Beyond its theoretical interest, the motivation 
behind this is mostly of cryptographic nature. In fact, during an attack of an RSA- 
encrypted exchange, the cryptanalyst may have access to additional information beyond 
the RSA public parameters ( e,N ), that may be gained for instance through side-channel 
attacks revealing some of the bits of the secret factors. Besides, some variations of the 
RSA Cryptosystem purposely leak some of the secret bits (for instance, Gil). In 1996, 
Rivest and Shamir’s results were improved in |2 by Coppersmith applying lattice-based 
methods to the problem of finding small integer roots of bivariate integer polynomials 
(the now so-called Coppersmith’s method). It requires only half of the most significant 
bits of p to be known to the cryptanalyst (that is j). 

In PKC 2009, May and Ritzenhofen (9j significantly reduced the power of the oracle. 
Given an RSA modulus N\ = piqi, they allow the oracle to output a new and different 
RSA modulus N 2 = piqi such that p\ and pi share at least t least significant bits (LSBs). 
Note that the additional information here is only implicit: the attacker does not know 
the actual value of the t least significant bits of the pfs. he only knows that p\ and pi 
share them. In the rest of the paper, we will refer to this problem as the problem of 
implicit factoring. When q\ and q 2 are a-bit primes, May and Ritzenhofen’s lattice- 
based method rigorously finds in quadratic time the factorization of N\ and /Vi when 
t > 2a+ 3. Besides, their technique heuristically generalizes to k — I oracle queries 
that give access to k different RSA moduli Nj = p:q, with all the p,’s sharing t least 
significant bits. With k— 1 queries the bound on t improves to: t > Note that 

these results are of interest for unbalanced RSA moduli: for instance, if N\ = p\q\, 
Ni = p 2 qi are 1000-bit RSA moduli and the q\ s are 200-bit primes, knowing that p\ 
and pi share at least 403 least significant bits out of 800 is enough to factorize /V| and Ni 
in polynomial time. Note also that the method absolutely requires that the shared bits be 
the least significant ones. They finally apply their method to factorize k n-bit balanced 
RSA moduli Nj = piqi under some conditions and with an additional exhaustive search 
of 2?. 

Very recently, in fl5l . Sarkar and Maitra applied Coppersmith and Grobner-basis 
techniques on the problem of implicit factoring, and improved heuristically the bounds 
in some of the cases. Contrary to Q, their method applies when either (or both) LSBs or 
MSBs of pi , P2 are shared (or when bits in the middle are shared). Namely, in the case of 
shared LSBs they obtain better theoretical bounds on t than [9j] as soon as a > 0.266 n. 
Besides, their experiments often perform better than their theoretical bounds, and they 
improve in practice the bound on t of Q when a > 0.21n. Note finally that their bounds 
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are very similar in the two cases of shared MSBs and shared LSBs. Readers interested 
in getting their precise bounds may refer to their paper ED. 

Unfortunately, Sarkar and Maitra’s method is heuristic even in the case of two RSA 
moduli, and does not generalize to k > 3 RSA moduli. In fact, when the s share 
MSBs and/or LSBs, their method consists in building a polynomial f\ in three vari¬ 
ables, whose roots are (172 + Pl 2 / 2 ), where y is the number of shared LSBs be¬ 
tween pi and pi. That is, ;l| oy ; ' 2 represents the part of p\— pi where the shared bits do 
not cancel out. To find the integer roots of f\ , they use the Coppersmith-like technique 
of Q which consists in computing two (or more) new polynomials f 2 ,fi, ■ ■ ■ sharing 
the same roots as f\ . If the variety defined by f\ , fi - fi, - ■ ■ is O-dimensional, then the 
roots can be easily recovered computing resultants or Grobner basis. However, with 
an input polynomial with more than two variables, the method is heuristic: there is no 
guarantee for the polynomials / 1 , fi > fi > ■ ■ • to define a O-dimensional variety. We repro¬ 
duced the results of Sarkar and Maitra and we observed that /1 ,/ 2 ,/ 3 , ■ ■ ■ almost never 
defined a O-dimensional variety. They observed however that it was possible to recover 
the roots of the polynomials directly by looking at the coefficients of the polynomi¬ 
als in the Grobner basis of the ideal generated by the /)’s, even when the ideal was of 
positive dimension. The assumption on which their work relies is that it will always be 
possible. For instance, in the case of shared MSBs between p\ and pi, they found in 
their experiments that the Grobner basis contained a polynomial multiple of x — — 1 
whose coefficients lead immediately to the factorization of Ni and /Vi. They support 
their assumption by experimental data: in most cases their experiments perform better 
than their theoretical bounds. It seems nevertheless that their assumption is not fully 
understood. 

Our contribution consists of a novel and rigorous lattice-based method that address 
the implicit factoring problem when p\ and pi share most significant bits. That is, we 
obtained an analog of May and Ritzenhofen’s results for shared MSBs, and our method 
is rigorous contrary to the work of Sarkar and Maitra in 11151 . Namely, let N\ = p\i/\ 
and N 2 = P 2 Q 2 be two RSA moduli of same bit-size n. If < 71 , <72 are (X-bit primes and 
P\,P 2 share t most significant bits, our method provably factorizes N\ and Ni as soon 
as t > 2 a + 3 (which is the same as the bound on t for least significant bits in Q). This 
is the first rigorous bound on t when p\ and p 2 share most significant bits. From this 
method, we deduce a new heuristic lattice-based for the case when p\ and pi share t 
bits in the middle. Moreover, contrary to |15j, these methods heuristically generalize to 
an arbitrary number k of RSA moduli and do not depend on the position of the shared 
bits in the middle, allowing us to factorize k RSA moduli as soon as t > p-j- a + 6 (resp. 
t > a -\- 7) most significant bits (resp. bits in the middle) are shared between the /Vs 
(more precise bounds are stated later in this paper). A summary of the comparison of 
our method with the methods in J9j] and lfl5l can be found in table Q] 

Let’s give the main idea of our method with 2 RSA moduli in the case of shared 
MSB’s. Consider the lattice L spanned by the row vectors vq and V 2 of the following 
matrix: 
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Table 1. Comparison of our results against the results of (9j and OH with k RSA moduli 



May, Ritzenhofen’s 
Results 0 

Sarkar, Maitra's Results H151 

Our results 

k = 2 

When pi,P 2 share 
t LSBs: rigor¬ 
ous bound of 

t > 2a + 3 using 
2-dimensional 
lattices of Z 2 . 

When pi,P 2 share either t 
LSBs or MSBs: heuristic 
bound better than t > 2 a + 3 
when a > 0.266n, and ex¬ 
perimentally better when 
a > 0.21/1. In the case of 
t shared bits in the middle, 
better bound than t > 4a + 7 
but depending on the position 
of the shared bits. Using 
46-dimensional lattices of Z 46 

When p\,P 2 share / MSBs: rig¬ 
orous bound of / > 2a + 3 using 
2-dimensional lattices of 7? . In 
the case of t bits shared in the 

middle: heuristic bound of t > 
4a + 7 using 3-dimensional lat¬ 
tices of Z 3 . 

k> 3 

When the pi s 
all share t LSBs: 

heuristic bound of 
t > jrr-j-O! using 
k-dimensional 
lattices of Z k . 

Cannot be directly applied. 

When the p,’s all share t 
MSBs (resp. bits in the mid¬ 
dle): heuristic bound of t > 
jrr a + 4 (resp. t > j^a + 
4), with 4 < 6 (resp. < 7) and 
using ^-dimensional 

k(k+ 1) 

dimensional) lattices of Z 2 . 


Consider also the following vector in L: 

Vo = giVi + 4 2 V2 = {q\K,q 2 K,qiq 2 {p 2 - P\)) 

The key observation is that the t shared significant bits of p\ and pi cancel out in the 
algebraic relation q\ Ni — qiN\ = q\qi(P 2 — Pi)- Furthermore, we choose K in order to 
force the coefficients of a shortest vector of L on the basis (vi,V 2 ) to be of the order 
of 2“ ss q\ ~ q 2 - We prove in the next section that Vo is indeed a shortest vector of L 
(thus N\ and Ni can be factored in polynomial time) as soon as t > 2a + 3. Besides, 
we generalized this construction to an arbitrary number of k RSA moduli such that a 
small vector of the lattice harnesses the same algebraic relation, and to shared middle 
bits. However, the generalized constructions in both cases become heuristic: we use the 
Gaussian heuristic to find a condition on t for this vector to be a shortest of the lattice. 

Applications of implicit factoring have not yet been extensively studied, and we be¬ 
lieve that they will develop. The introduction of 0 gives some ideas for possible ap¬ 
plications. They include destructive applications with malicious manipulation of public 
key generators, as well as possibly constructive ones. Indeed, our work shows that when 
t> 2a + 3, it is as hard to factorize /V] = p\q\, as generating^ = /m /2 with pi sharing 
t most significant bits with p\. This problem could form the basis of a cryptographic 
primitive. 

Throughout this paper, we heavily use common results on euclidean lattice. A sum¬ 
mary of these results can be found in appendix[A] The paper is organized as follows. In 
section [2 we present our rigorous method in the case of shared MSB’s and two RSA 
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moduli, we generalize it to k RSA moduli in section 0 In section 0 we present our 
method in the case of shared bits in the middle. Finally, in section 0 we present our 
experiments that strongly support the assumption we made in the case of k RSA moduli 
and of shared middle bits. 


2 Implicit Factoring of Two RSA Moduli with Shared MSBs 

In this section, we study the problem of factoring two n-bit RSA moduli: N\ = p\q\ 
and N2 = P2 c l2, where q\ and q2 are a-bit primes, given only the implicit hint that p\ 
and P 2 share t most significant bits (MSBs) that are unknown to us. We will show that 
Ni and /V? can be factored in quadratic time as soon as t > 2a + 3. By saying that 
the primes p\,pi of maximal bit-size n - a + 1 share t MSBs, we really mean that 
\pi-pi\ < 2 "““~ ,+1 . 

Let’s consider the lattice L spanned by the row vectors (denoted by V| and V 2 ) of the 
following matrix: 


where JC = L 2“-+ij 

We have the following immediate lemma that makes our method work: 

Lemma 1. Let Vo be the vector of L defined by Vo = q \ v \ + q 2 '< , 2 - Then Vo can be 
rewritten as v 0 = (qiK,q2K,q\q2{p2~ Pi))- 

Note that the shared MSBs of p\ and P 2 cancel each other out in the difference pi — p 1 . 
Each of the coefficients of Vo are thus integers of roughly (n + a — t ) bits. Provided that t 
is sufficiently large, ±vo may be a shortest vector of L that can be found using Lagrange 
reduction on L. Moreover, note that as soon as we retrieve Vo from L, factoring N\ and 
N 2 is easily done by dividing the first two coordinates of Vo by K (which can be done in 
quadratic time in n). Proving that Vo is a shortest vector of L under some conditions on 
t is therefore sufficient to factorize N\ and N 2 ■ 

We first give an intuition on the bound on t that we can expect, and we give after that 
a proof that ±vo is indeed the shortest vector of L under a similar condition. 

The volume of L is the square root of the determinant of the Gramian matrix of L 

given by MM' = jqjq 2 jJ^+N 2 ) ' ^ at * S ’ vo ^) = K^N 2 +N 2 + K 2 which can 

be approximated by 2 2 ” - ' because K 2 « 2 2 (”~ r ' is small compared to the N 2 ~ 2 2 ". 
The norm of Vo is approximately 2" 1 because each of its coefficients have roughly 
n + a — t bits. If Vo is a shortest vector of L, it must be smaller than the Minkowski bound 
applied to L: 2 n+a ~ t « ||vo|| < v^Vo 1(E) 1 / 2 « 2” - '/ 2 , which happens when t > 2a. 
The following lemma affirms that Vo is indeed a shortest vector of L under a similar 
condition on t. 

Lemma 2. Let L be the lattice generated by the row vectors v | and \i of M and let 
v 0 = q \ v l + qi^i — {qiK,q 2 K,qiq 2 (p 2 ~ Pi)) as defined in Lemma\J\ The vector ±Vo 
is the shortest vector of the lattice L as soon as t > 2a + 3. 
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Proof. Let (bi,b 2 ) be the resulting basis from the Lagrange reduction on L. This re¬ 
duced basis verifies ||bi|| = Ai (L), 11b 2 11 = ki{L), and, by Hadamard’s inequality one 
have: ll b i||l! b 2!l > Vol(L). As Vo is in the lattice, |jbi|| = Ai(L) < ||vo||. Hence we 
get 11 b 2 11 > ■ Moreover, if Vo is strictly shorter that b 2 , Vo is a multiple of b|; 

for otherwise b 2 would not be the second minimum of the lattice. In this case, Vo = 
flbi = a(b\i + c\ 2 ),a,b,c £ Z, and looking at the first two coefficients of Vo, we get 
that ab = q\ and ac = c/ 2 - Since the qf s are prime, we conclude that a = ±1, that is, 
v'O = ±bj. Using the previous inequality, a condition for Vo to be strictly shorter than 
b 2 is: 

!|v 0 || 2 <Vol(L) (1) 

Let’s upper-bound the norm of Vo and lower-bound Vol(L). We first provide simple 
bounds that proves the lemma when t > 2a + 4 and derive secondly tighter bounds that 
require only t > 2a + 3. 

The p^s have at most n — a + 1 bits, and they share their t most significant bits 
so \p 2 ~ P\\ < 2 n ~ 0l+l ~ t . We thus have the inequality ||vo|| 2 < 2 2 (""') +1 (g 2 + q\) + 
q]q\{p\ — P 2) 2 which implies 

||vo|| 2 < 2 2 ^ 1+a_ ^ +2 + 2 2 ( a +”+ 1_f ) < 2 2 ("+“-0+3 (2) 

We can lower-bound the volume of L, using that /V],A , 2>2' 1 1 and K 2 > 2 2 "' n ^ , ' i : 

Vo\(L) 2 = K 2 (N\ + Nl + 2 2 ^) >2 4 ' , " 2, “ 1 (3) 

Using inequalities Q and ©, the inequality ([!} is true provided that: 2 2 (” + “ _r )+ 3 < 
2 2 which is equivalent to (as t and a are an integers): 

t>2a + 4 (4) 

We have thus proved the lemma under condition (@}. We now refine the bounds on || vo|{ 
and Vol( L) in order to prove the tight case. 

The integers q\ and qi are a-bit primes, therefore qi < 2“ — 1, (i = 1,2). Define £i 
by 2“ — 1 = 2“~ ei . We get q 2 < 2 2a ~ 2ei ,(i = 1,2). Moreover, since K = [2"~ r+ 2j, we 
have K 2 < 2 2! ""'- ) 1 1 . From these inequalities, we can upper-bound K 2 q 2 

K 2 q 2 < 2 2 ("- ; +“)+ 1 -2ei j (2 =1,2) (5) 

The pfs have at most n — a+ 1 bits and they share t bits, so ( p 2 — Pi) 2 < 2 2 (' ! "“ +1_ ^. 
Thus, using the upper-bound on the q 2 , we have 

q\q 2 2 {p 2 ~Pi) 2 < 2 2 (' ! “'+“ +1 - 2£ i) (6) 

We can finally bound ||vo|| 2 = K 2 {q 2 +q?f) + q\q\{pi — pi) 2 using (0) and ©: 

1111 2 ^ ^2(ti~i~cc —?)+2— 2e\ _|_ 2e\) 2 2{n-\-oc —?)+3 —£\ 


(7) 
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Let’s now define £2 by the equality 2" r+1 / 2 — 1=2” t+l / 2 e 2 . We have that K = 
> 2” _!+1 / 2 “ £ 2 and > 2 2 "“ 2 , we can therefore lower-bound Vol(L) 2 : 

Vol(L) 2 = K 2 (Nl+Nl + 2 2 (”“')) > K 2 (Nl+Nl) > 2 4 ”- 2( “ 2£ 2 (8) 

Using the inequalities (JTJ) and ©, the condition ([7]) is true under the new condition 

22(«+a-t)+3-£i < 2 2n - t ~ e 2 which is equivalent to f > 2a + 3 + £2 — £ 1 . 

Since £1 = log-T ), £2 = logi(-^—) and a < n — t, we have £2 < £1 and the 

1- 2B- " " 1_ „_, + i 

2 + 2 

result follows. 

From the preceding Lemmas Q] and [2j one can deduce the following result. 

Theorem 1. Let N\ = p\q\, /V 2 = piqi be two n-bit RSA moduli, where the qfs are cu¬ 
bit primes and the pi’s are primes that share t most significant bits. If t > 2a + 3, then 
N\ and N 2 can be factored in quadratic time in n. 

Proof. Let L be the lattice generated by Vi and V 2 as above. Since the norms of Vi and 
V 2 are bounded by 2” +1 , computing the reduced basis (bi,b 2 ) takes a quadratic time in 
n. By Lemma[2] we know that bo = ±vo as soon as t > 2a + 3. The factorization of N\ 
of N 2 follows from the description of Vo given by the lemma|7] 

Remark 1. For our analysis, the value K = is indeed the best possible value. If 

we use K = [2 n ~ t+ Ij, we obtain the bound t >2a + /(y) with/(y) = |-y+log 2 (2 + 
2 2 7). The minimum of / is 3 and is attained in y = ^ • 

3 Implicit Factoring of k RSA Moduli with Shared MSBs 

The construction of the lattice for 2 RSA moduli naturally generalizes to an arbitrary 
number k of moduli. Similarly, we show that a short vector Vo of the lattice allows 
us to recover the factorization of the Nf s. This vector takes advantage of the relations 
q,Nj — qjNi = cpqj(pj — pi) for all i.j G {1 . However, we were unable to prove 

that Vo is a shortest vector of the lattice. Therefore, our method relies on the Gaussian 
heuristic to estimate the conditions under which Vo should be a shortest vector of the 
lattice. Experimental data in section [5] confirms that this heuristic is valid in nearly all 
the cases. 

In this section, we are given k RSA moduli of n bits N\ = piqi,.. ■,= piqi- where 
the q {s are a-bit primes and the pis are primes that all share t most significant bits. 

Let us construct a matrix M whose row vectors will form a basis of a lattice L; this 
matrix will have k rows and k+ ( 2 ) = '+ 1 - 1 columns. Denote by si,..., s m with m = ( 2 ) 
all the subsets of cardinality 2 of {1,2To each of the si s, associate a column 
vector q of size k the following way. Let a,b be the two elements of s t , with a < b. We 
set the fl-th element of q to /V/,, the b-\h element of q to —N a , and all other elements to 
zero. Finally, one forms M by concatenating column-wise the matrix Kf y k. where fxk 
is the identity matrix of size k, along with the matrix C m composed by the m column 
vectors ci,... ,c m . K is chosen to be [2 n ~ l+ t\. We will call Vi,... ,Vt the row vectors 
of M. 
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To make things more concrete, consider the example of k = 4. Up to a reordering of 
the columns (that changes nothing to the upcoming analysis), 


(K 0 0 0 N 2 N 3 N 4 0 0 0 \ 

0 K 0 0 -Ni 0 0 N 3 N 4 0 

0 0 K 0 0 -Mi 0 -N 2 0 N 4 

\0 0 0 f 0 0 -All 0 -N 2 -N 3 J 


where K = |_2" _f+ 2j 


(9) 


Notice that the columns k + 1 to k + m correspond to all the 2-subsets of {1,2,3,4}. 

Similarly to the case of 2 RSA moduli (lemma [T]), L contains a short vector that 
allows us to factorize all the /V,’s: 

Lemma 3. Let Vo be the vector ofL defined by Vo = Xf=i t// v i- Then Vo can be rewritten 
as follows: 

Vo = {q\K,.. .,q k K, ...,q a q b (p b -p a ),... ) 

V{a,i}c{l,..,*} 


Proof. For 1 < i < m, let a,b be such that s, = {a,b} and a <b. By the construction 
of the Cj’s, we get that the (k+ i)-th coordinate of Vo is equal to q a N b — q b N a = q a q b 
{Pb Pa)- □ 


Remark that Vo is short because its m last coordinates harness the cancellation of the t 
most significant bits between the s. Retrieving ±vo from L leads immediately to the 
factorization of all the Nfs, dividing its first k coordinates by K. 


Assumption 1. If±\ o is shorter than the Gaussian heuristic A| (L) w 
applied to the d-dimensional lattice L then it is a shortest vector ofL. 



This assumption is supported by experimental data in the section 0 We found it to be 
almost always true in practice. This condition can be seen as an analog of condition|T] 
of section[2]in the case of two RSA moduli. 

Let’s derive a bound on t so that Vo is smaller than the Gaussian heuristic applied to 
L. The norm of Vo can be computed and upper-bounded easily: ||vo|| 2 = K 2 (Sf=i qj) 
+ X{;,;}c{i,...,/t} q}q 2 j{Pi — Pj) 2 < £ 2 2 2 (” +k ~') +1 . Computing the volume of L is a bit 

k -1 

more involved, we refer to Lemma [5] of appendix |Bj Vol(L) = K (K 2 + , N 2 ) ^ 

and thus Vol(L) > 2"“' ^\/k2"“ 1 ^ 

We now seek the condition on t for the norm of Vo to be smaller than the Gaussian 
heuristic. Using the two previous inequalities on ||vo|| and Vol(L), we get the stricter 
condition: 

2 

k 2 2 l(n + a-t)+X < (y~ k 2 n - 1 ) k ~ 1 ) 1 

Expanding everything and extracting t, we get the following condition: 




2(t-l) 


2+ Mf) + , og2(ro) 


When k> 3, we can derive a simpler and stricter bound on t: t>^a + 6 


(10) 
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Finally, as ±vo is now the shortest vector of L under Assumption |T] it can be found 
in time < if ( k , k ^ X \ n) where ^(k^s^B) is the time to find a shortest vector of a k- 
dimensional lattice of If given by /1-hit basis vectors. We just proved the following 
theorem: 

Theorem 2. Let N\ = piqi, ■ ■ . ■ Nk = PkMk be k n-bit RSA moduli, with the q, ’s being 
a-bit primes, and the pfs being primes that all share t most significant bits. Under 
Assumption [7] the Nj’s can be factored in time ho{k, as soon as t verifies 

equation C3. 

Remark 2. Note that we can find a shortest vector of the lattice of Theorem |2] using 
Kannan’s algorithm (Theorem[6]in appendix[A]) in time £e+ 0 W) where & 

is a polynomial. It implies that we can factorize all N\,... ,Nk in time polynomial in 
n as soon as k is constant or k k is a polynomial in n. Unfortunately, to the best of our 
knowledge, this algorithm is not implemented in the computer algebra system Magma 
ITl on which we implemented the methods. In our experiments, to compute a short¬ 
est vector of the lattice, we used instead the Schnorr-Euchner’s enumeration algorithm 
which is well known (see m) to perform well beyond small dimension (< 50) and 
this step in Magma took less than 1 minute for k < 40. One may also reduce the lattice 
using LLL algorithm instead of Schnorr-Euchner’s enumeraion. If t is not too close to 
the bound of Theorem H the Gaussian heuristic suggests that the gap (see Definition 
H in the appendix) of the lattice is large, and thus LLL may be able to find a shortest 
vector of L even in medium dimension (50-200). 

Similarly to the case of 2 RSA moduli, K = | 2" ?j is optimal for our analysis. 
Indeed, if we redo the analysis with K = [2"“ r+ 7J, we find that the optimal value for 
y is the one that minimizes the function /> = y i—> ^fclog 2 (A: — 1 + 2 2 ' Y ~ 1 ) — y, which is 
y = ^ regardless of k. 

Finally, note that a slightly tighter bound (differing to equation[T0]by a small additive 
constant) may be attained by bounding ||vo|| and Vol(L) more precisely. 

4 Implicit Factoring with Shared Bits in the Middle 

In this section, we are given k RSA moduli of n bits N\ = p\q\,... .N^ = pi-qk where 
the <y,’s are a-bit primes and the p {s are primes that all share t bits from position t\ to 
ti = t\ + 1. More precisely, these RSA moduli all verify: 

Ni = ptqi = (pi 2 2' 2 + p2 n + pjqi 

where p is the integer part shared by all the moduli. Contrary to the LSB case presented 
in J9| and the MSB one developed in the previous sections, the method we present 
here is heuristic even when k=2. We sketch now our method when k = 2 and present 
the details on the general result later. When k = 2, we have a system of two equations 
in four variables pi,qi,p 2 ,qi'- M = Piqi = {pif 2 * 2 + F 2 ' 1 + p\ a )q\ and No = p 2 q 2 = 
(F2 2 2?2 + F 2?1 + P 2o )q 2 - Similarly to the LSB’s case (see Q), this system can be reduced 
modulo 2’ 2 . One obtains a system of two equations with 5 variables p, p \ () , p 2o , q\, q 2 : 

{ (F 2 ' 1 + Flo)*?! = N \ mod 2 ' 2 

\ (f 2?1 + P 2 0 )q 2 = N 2 mod 2' 2 


(11) 
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The problem can now be seen as a modular implicit factorization of Ni and N 2 with 
shared MSBs. Thus, we adapt the method we proposed in section|2]to the modular case. 
More precisely, we consider the lattice L defined by the rows of the matrix 

/K 0 N 2 \ 

M= 0 tf-Ah (12) 

\0 0 2 ? 2 ) 


Let Vo be the vector (q\K,q 2 K,r) with r being the unique remainder of q\N 2 — q 2 N\ 
modulo 2' 2 in ] — 2' 2 ~ 1 ,2' 2 ~ 1 1 . Clearly, Vo is in L. As in the section [3 we search for a 
condition on the integer t under which ±vo is the shortest vector in L under Assumption 
[H(here, the dimension of the lattice L is 3). The integer K will be set at the end of the 
analysis. 

We have ||vo|| 2 = K 2 (q\ + g 2 ) + r 2 and ] — 2 ,2 “ 1 ,2' 2 ~ 1 ] 9r = q\N 2 — q 2 N\ mod 2' 2 
= q\q 2 {p 2o — p i 0 ) mod 2' 1+r with \p 2o — p\ Q \ < 2 ?1 and qi < 2 a . Thanks to the upper- 
triangular shape of M, the volume of L is easily computed: Vo I L = K 2 2‘ 2 . Thus, we 
can respectively upper-bound and lower-bound ||vo|| 2 and VolL by 2 la+l K 2 + 2 2,1+4a 
and K 2 2 ' 2 ; a condition on t so that v<) is smaller than the Gaussian heuristic follows: 
22 a+i K 2 + 2 2ti+4a < ^-{K 2 2 t2 -)h ,T\as condition is equivalent to 



log 2 (2 2o!+ l -hK2 +2? ,l+4a K~T) + log 2 () 


and the integer value of K which minimizes the right-hand of this inequality is K = 
2 a+t] . Hence, under AssumptionQ] one can factorize N\,N 2 in polynomial-time as soon 
as 

t>4a + ^(l + log 2 (ne)) (13) 

A stricter and simpler condition on t is: t > 4a + 7. 

We now inspect when Assumption [U is not verified, that is we study the possible 
existence of exceptional short vectors in L that are smaller than Vo- These vectors may 
appear when there exists small coefficients c\ , c 2 (< 2 a ) such that c\N\ — c 2 N 2 mod 2' 2 
is small (say « 2' 2 ?). In particular, to make easier the analysis, we examine the case 
when the simple vector Vi defined with ci = c 2 = 1 is smaller than Vo- The inequality 
|| Vr || 2 < 11 Vo11 2 is equivalent to t — y < 2a. So this inequality is possible only for small 
t and large y which can be considered as an exception. In our experiments, these excep¬ 
tional shorts vectors (and, in particular, simple vectors Vi) almost never appear in the 
k = 2 case with t verifying the bound 1731 

The method for k > 3 is a straightforward generalization of the k = 2 case by using 
the results of section [7] Let’s consider the lattice L defined by the rows of the matrix M 
given by 
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where C m is the matrix defined in section [I]and formed by the concatenation of m = ( 2 ) 
column vectors of k rows and 4 X £ (resp. I mxm ) is the identity matrix of size k / k (resp. 
m x m). Thus, M is a square upper triangular matrix of size (m + k) x (m + k) and the 
volume of the m + A'-dimensional lattice L is easily computed: Vo I L = K k 2" 1 ' 2 . 

The vector 

v 0 = (qiK,...,q k K, ...,r^,...) 

V{a,b}c{l,...,k} 

with f( a hj defined as the unique remainder of q a qb(Pb ~ Pa) = qaN a — qi,Nb modulo 2' 2 
in ] — 2 t2 ~ 1 1 2‘ 2 ~ 1 ], is clearly a vector of L. As we do above, we search for a condition 
on the integer t under which ±vo is the shortest vector in L under Assumption [Q The 
integer K will be set at the end of the analysis to be optimal. 

We have ||v 0 || 2 = K 2 (q\ H- b q\) +’L{ a ,b}c{\,...,k} r f a ^ b )’ that we can bound by 

||v 0 |! 2 < k2 2a K 2 + m2 2t] 1 4a . A condition on f, under Assumption!]] follows: 

kl la K 2 + ml Aa+2h < T ^(K k 2’ nt2 )^. 

2 ne 


This condition is equivalent to 
m + k 


t > 


2m 


log 2 (k2 2a ~^ h K^k + m 2 4a+ £k tl K-^')+\og 2 


(14) 


The value of K which minimizes the right-hand of this inequality is given by the zero 
of the derivative of the function K i—> k2 2a ~™+k tl K™+£ + /m2 4o!+ ”+V' K~™+z. Actually, 
K is given by the solution of the equation 


2mk 2 2a- 

m + k 


m+k ^ I m + k 


2 km 
m + k 


2 Aa +-£h^K~ 


m+3k 

m+k 


and thus, after simplification, K = 2 a H] which is an integer value. A general condition 
on t becomes 


t > 


m + k 
2m 


log 2 ((m-b£)2 2a ^) + log 2 


2 ne 


and the general result immediately follows. 


Theorem 3. Let N\ = /; i c/ 1 ,..., /V) = Pkqk be k n-bitRSA moduli, where the q, ’s are a- 
bit primes and the pi’s are primes that all share t bits from the position 1 1 to ti = t\ + 1. 
Under Assumption\J\ the N\’s can be factored in time <g 7 ( ^+ 1 ) ; as soon as 

'- 2a+ t L -\ a+ wrv+- {lne] 

As in the case of k = 2, we inspect the general case k > 3 for the existence of ex¬ 
ceptional vectors vj = (c\K,..., c^K,..., CjNj — CjNj mod 2' 2 ,...) which will disprove 
Assumption [I] that is, with c, ’s (< 2“) and CjNj — cjNj mod 2 ?2 small (say « 2' 2 “ r ). 
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The condition under which the simple vector Vi with c\ = C 2 = • • • = c* = 1 verify 

llvill 2 < llvoll 2 is given by 


1 1, Ak+ 1)2 2 «- 1 -1. 

r — 7 < a + — + - log(-jj-jj- )»2a 


Thus, as in the case of k = 2, for r and a small and y large enough, this type of simple 
vectors may appear. Moreover, the degree of liberty for choosing the c, increases with 
k, thus, exceptional vectors may appear more frequently when k grows. This fact was 
observed during our experiments. 


Remark 3. During our first experiments, in few cases, our method fails to factor the 
Nj’s. After analysis of the random generation functions used in our code, it turns out 
that the <y, where randomly generated in the interval ] 2 “ - 1 , 2 “] . Thus, the probability 
that a lot of qf s have exactly size a is high. If, moreover, a is small enough compared 
to ?2 (<x < t 2 = t +1 \), the corresponding Ni — Nj mod 2' 2 may be very small. This could 
be explained by the following fact: some of the most significant bits (and at least the 
highest bit) of Nj mod 2' 2 and Nj mod 2 h - will be a part of the shared bits between 
the p^s and thus they cancel themselves in ( Nj — Nj) mod 2' 2 . Hence, in this case, we 
have an exceptional short vector in L and our method fails; on the other hand, if one use 
these moduli then an attacker may use this extra information to easily factor them with 
another method. 


5 Experimental Results 

In order to check the validity of Assumption Q] and the quality of our bounds on t, we 
implemented the methods on Magma 2.15 (Tj. 

5.1 Shared MSBs 

We generated many random 1024-bit RSA moduli, for various values of a and t. We 
observed that the results were similar for other values of n. In the case where k = 2, 
we used the Lagrange reduction to find with certainty a shortest vector of the lattice, 
and for 3 < k < 40 we compared Schnorr-Euchner’s algorithm (that provably outputs 
a shortest vector of the lattice) with LLL (that gives an exponential approximation of a 
shortest vector). We used only LLL for k = 80. 


Table 2. Results for k = 2 and 1024-bit RSA moduli with shared MSBs 


a (bit-size of the qi s) 

Bound of Theorem |T|^ > 2a + 3 

Best experimental t 

150 

303 

302 

200 

403 

402 

250 

503 

502 

300 

603 

602 
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Table 3. Results for k = 3,10,40 and 1024-bit RSA moduli with shared MSBs 


a (bit-size 
of the c/i’s) 

Theoretical 

bound t 

Best experimental t 
using LLL algo. 

Best experimental t using 
Schnorr-Euchner’s algo. 

Failure rate of 
AssumptionH] 

Results for k = 3 (Theoretical bound ofTheorem[2] t> |o£+ 5.2...) 

150 

231 

228 

228 

0% (t = 227) 

200 

306 

303 

303 

0% (t = 302) 

250 

381 

378 

378 

0% (t = 377) 

300 

456 

453 

453 

0% (t = 452) 

350 

531 

528 

528 

0% (t = 527) 

400 

606 

603 

603 

0% (t = 602) 

Results for k = 10 (Theoretical bound of Theorem|2] t > -^a + 4.01.. .) 

150 

171 

169 

169 

0% (t = 168) 

200 

227 

225 

225 

3% (t = 224) 

250 

282 

280 

280 

3 %(t = 279) 

300 

338 

336 

336 

1 %(t = 335) 

350 

393 

391 

391 

2% (t = 390) 

400 

449 

447 

447 

0% (t = 446) 

Results for k = 40 (Theoretical bound of Theorem[2] t > + 3.68.. .) 

150 

158 

156 

155 

2% (t = 154) 

200 

209 

208 

207 

3% (r = 206) 

250 

261 

259 

258 

1% (t = 257) 

300 

312 

310 

309 

1 %(t = 308) 

350 

363 

362 

361 

0% (t = 360) 

400 

414 

413 

412 

2% (r = 411) 


We conducted experiments for k = 2,3,10,40 and 80, and for several values for a. 
For specific values of k, a and f, we said that a test was successful when the first vector 
of the reduced basis of the lattice was of the form ±vo (that is, it satisfies AssumptionH] 
in the heuristic case k > 3). For each k and each a, we generated 100 tests and found ex¬ 
perimentally the best (lowest) value of t that had 100% success rate. We compared this 
experimental value to the bounds we obtained in Theorems [2] and Q] For the first value 
of t that does not have 100% success rate and for k > 3, we analyzed the rate of failures 
due to AssumptionH] not being valid. Note that failures can be of two different kinds: 
the first possibility is that ||vo|| is greater than the Gaussian heuristic, and the second 
one is that ||vo|| is smaller than the Gaussian heuristic yet Vo is not a shortest vector of 
the lattice (that is, Assumption [T] does not hold). We wrote down the percentage of the 
cases where Assumption |T| was not valid among all the cases where ||vo|| was smaller 
than the Gaussian heuristic. These results are shown in tables [2] and 0 Let’s take an ex¬ 
ample. For k = 10 and a = 200 (second line of the part corresponding to k = 10 in table 
0, Theorem [2] predicts that Vo is a shortest vector of the lattice as soon as t > 227. It 
turned out that it was always the case as soon as t > 225, which is better than expected. 
For t = 224, Assumption|T]was not valid in 3% of the cases. 
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Table 4. Results for k = 5 and 1024-bit RSA moduli with shared bits in the middle (a £ {99,100}, 
1\ = 20, theoretical bound t > 254) 


Experimental 

t 

Failure rate of ||vo|| < 
Gaussian heuristic 

Failure rate with Schnorr- 
Euchner’s algo. 

Failure rate with 
LLL’s algo. 

261 

0% 

0% 

0% 

260 

0% 

1% 

1% 

259 

0% 

1% 

1% 

258 

0% 

1% 

0% 

257 

0% 

3% 

2% 

256 

0% 

6% 

5% 

255 

0% 

17% 

10% 

254 

0% 

33% 

19% 

253 

0% 

58% 

28% 

252 

2% 

90% 

58% 

251 

96% 

100% 

89% 


Let’s analyze the results now. In the rigorous case k = 2, we observe that the attack 
consistently goes one bit further with 100% success rate than our bound in TheoremQ] 
In all our experiments concerning the heuristic cases k > 3, we observed that we had 
100% success rate (thus, Assumption [I] was always true) when t was within the bound 
(ITO of Theorem [2] That means that Theorem [2 was always true in our experiments. 
Moreover, we were often able to go a few bits (up to 3) beyond the theoretical bound 
on t. When the success rate was not 100% (that is, beyond our experimental bounds on 
1 ), we found that Assumption!]] was not ti' ue in a very limited number of the cases (less 
than 3%). Finally, up to dimension 80, LLL was always sufficient to find Vo when t was 
within the bound of Theorem^ and Schnorr-Euchner’s algorithm allowed us to go one 
bit further than LLL in dimension 40. 

5.2 Shared Bits in the Middle 

Contrary to the case of shared MSBs, AssumptionQ]may fail when we apply our method 
with shared bits in the middle (see section [4]). When k = 2 the phenomenon of excep¬ 
tional short vectors rarely appeared when t was within the bound of Theorem [3] (less 
than 1% of failure and did not depend on the position t \, moreover, we were generally 
allowed to go 2 or 3 bits further with 90% of success). When k > 3 it was not still 
the case. When Schnorr-Euchner’s algorithm did not return Vo, we tried to find it in 
a reduced basis computed by LLL. If neither of these algorithms was able to find Vo 
then our method failed. The table 0] shows the result of our experiments for k = 5 RSA 
moduli of size n = 1024 and r/,’s of size a £ {100,99} (see Remark O. As one can 
see, our method can be successfully applied in this case. During these experiments, the 
failure rate of our method was equal to the failure rate of finding Vo in a reduced basis 
computed by LLL. More generally, our experiments showed that for the same size of 
problems the rate of success is approximately 80% when t was within the bound of 
Theorem [T] and allowed us to go one or two bits further with success rate «50%. 
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5.3 Efficiency Comparisons 

Additionally, we show in table [5] the lowest value of t with 100% success rate and the 
running-time of LLL and Schnorr-Euchner’s algorithm for several values of k (k RSA 
moduli with pi’s factors sharing t MSBs). For each k, we show the worst running-time 
we encountered when running 10 tests on an Intel Xeon E5420 at 2.5Ghz. We see that 
all individual tests completed in less than 1 second for 2 < k < 20. We used Schnorr- 
Euchner’s algorithm up to k = 60 where it took at most 6200 seconds. LLL completes 
under one minute for 20 < k < 40 and in less than 30 minutes for 40 < k < 80. 


Table 5. Running time of LLL and Schnorr-Euchner’s algorithm, and bound on t as k grows. 
(Shared MSBs with a = 300 and n = 1024) 



k (number of RSA moduli) 


6 Conclusion 

In this article we have studied the problem of integers factorization with implicit hints. 
We have presented new lattice based methods in order to factorize k> 2 RSA moduli 
Ni = piq, with polynomial complexity in log(/V/) when pi’s share unknown MSBs or 
contiguous bits in the middle. In the case k = 2 and shared MSBs, our method is the first 
one to be completely rigorous. These new results can be seen as an extension of the ones 
presented in f9| and [15) where, respectively, May and Ritzenhofen gave same type of 
results in the case where the pi’s share LSBs and Sarkar and Maitra presented heuristic 
methods based on the Coppersmith’s algorithm for finding small roots of polynomials 
for k = 2 moduli with shared MSBs (and/or LSBs) or bits in the middle . Our method 
gives comparable theoretical results as the one of May and Ritzenhofen and it is more 
efficient than the Sarkar and Maitra’s method. 
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Whether the method can be applied for k > 3 /V,’s RSA moduli with /?,■’s sharing 
MSBs and LSBs remains an open issue. In this case, the problem has much more vari¬ 
ables and our method can not be directly applied. One possible way to follow for attack¬ 
ing this problem is to use algebraic techniques, in particular elimination theory, jointly 
with lattice based methods. This would be an interesting focus for future research. 
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A Common Results on Lattice 

An integer lattice L is an additive subgroup of Z”. Equivalently, it can be defined as 
the set of all integer linear combinations of d independent vectors b i,.... b ( | of Z". The 
integer d is called the dimension of L, and B = (b |..... b ( ]) is one of its bases. All the 
bases of L are related by a unimodular transformation. The volume (or determinant ) of 
L is the //-dimensional volume of the parallelepiped spanned by the vectors of a basis 
of L and is equal to the square root of the determinant of the Gramian matrix of B. It 
does not depend upon the choice of B. We denote it by Vol(L). 

We state (without proofs) common results on lattices that will be used throughout 
this paper. Readers interested in getting more details and proofs can refer to ITOft . 

Definition 1. For 1 < r < d, let X r (L) be the least real number such that there exist at 
least r linearly independent vectors of L of euclidean norm smaller than or equal to 
A,-(L). We call A; (L)..... A fiL) the d minima ofL. and we call g{L) = > 1 the gap 

ofL. 

Lemma 4 (Hadamard). Let B = (bi,..., b ( i) be a basis of a d-dimensional integer 
lattice of Z". Then the inequality Il/Li ||bj|| > Volf) holds. 

Theorem 4 (Minkowski). Let L be a d-dimensional lattice of IT. Then there exists a 
non zero vector v in L which verifies ||v|| < s/dVofiL)^. Aft immediate consequence is 
that ‘Ai(L) < sfdNofLf 

Theorem 5 (Lagrange reduction). Let L be a 2-dimensional lattice of Z”, given by a 
basis B = (bi.bj). Then one can compute a Lagrange-reduced basis B' = (vi, V 2 ) ofL 
in time (?(«log 2 (max(||bi||, ||b 2 1|)))- Besides, it verifies ||vi|| = Ai(L) and 11V 2 11 = 
Ai(L). More information about the running time of the Lagrange reduction may be 
found in m. 

Theorem 6 (Kannan’s algorithm, see II6I13I4II ). Let L be a d-dimensional lattice of 
Z" given by a basis (bi,... ,bd). One can compute a shortest vector of L (with norm 
equal to Aj (L)) in time ^(^(log B,n)d^ +0 ^^) where SZ is a polynomial and B = 
max,- (11 bi ||). This is done by computing a HKZ-reduced basis ofL. 

Theorem 7 (LLL). Let Lbe a d-dimensional lattice ofL' 1 given by a basis ( h|,..., b<j). 
Then LLL algorithm computes a reduced basis (vi,..., Vd) that approximates a shortest 
vector of L within an exponential factor || vj || < 2~r Vol (Lf. The running time of 
Nguyen and Stehles version is ff{d^(d + log B) logB) where B = max,(||bi||), see sm. 
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In practice, LLL algorithm is known to perform much better than expected. It has been 
experimentally established in [0 that we can expect the bound ||vjJ| < L0219 f/ Vol(L)? 
on 11 Vi j | on random lattices and that finding a shortest vector of a lattice with gap greater 
than 1.0219 d should be easy using LLL. 

B Exact Computation of the Volume of Lattice L of Section |3| 

In this section, we compute exactly the volume of the lattice L defined at the beginning 
of section [3 As a visual example of the construction of this lattice, the reader may take 
a look at the matrix defined in equation (O in the case of k = 4. We use the notations of 
section [3] 

Lemma 5. Let L be the lattice whose construction is described at the beginning of 

k -1 

section\3\ Then its volume is equal to Vol(L) = K ( K 2 + Xf=i V") ■ 

Proof. Let G be the Gramian matrix (of size k x k) of L. Its diagonal terms are (vi, Vj) = 
K 2 + Xm=i V? and its other terms are: (vi, vj) = —NjNj. Observe that we can rewrite G 

as follows G = ( K 2 + Xf = i Nf) hxk + J where 4 X & is the identity matrix of size k and J 
is the k x k matrix with terms —NjNj. If we let x.i be the characteristic polynomial of J 
and Ao = K 2 + Xf = i Nf, we observe that det(G) = ^y(—Ao). 

All the columns of J are multiples of (Ni,N 2 , ■ ■ ■ ,A^)L The rank of J is thus 1. 
The matrix J has therefore the eigenvalue 0 with multiplicity k — 1. The last eigen¬ 
value is computed using its trace: Tr(7) = — X?= \Nf. Therefore, up to a sign £/(X) = 
X k ~ 1 (X + Xf =1 Nf). We conclude that det(G) = %j (-K 2 - jf =1 Nf ), hence det(G) = 

K 2 ( K 2 + Xf = i Nf ) k ~' and Vol(L) = ^det(G) = K (K 2 + Xf =1 Nf) 


□ 
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Abstract. In many practical settings, participants are willing to de¬ 
viate from the protocol only if they remain undetected. Aumann and 
Lindell introduced a concept of covert adversaries to formalize this type 
of corruption. In the current paper, we refine their model to get stronger 
security guarantees. Namely, we show how to construct protocols, where 
malicious participants cannot learn anything beyond their intended out¬ 
puts and honest participants can detect malicious behavior that alters 
their outputs. As this construction does not protect honest parties from 
selective protocol failures, a valid corruption complaint can leak a single 
bit of information about the inputs of honest parties. Importantly, it is 
often up to the honest party to decide whether to complain or not. This 
potential leakage is often compensated by gains in efficiency—many stan¬ 
dard zero-knowledge proof steps can be omitted. As a concrete practical 
contribution, we show how to implement consistent versions of several 
important cryptographic protocols such as oblivious transfer, conditional 
disclosure of secrets and private inference control. 

Keywords: Consistency, equivocal and extractable commitment, 
oblivious transfer, private inference control. 


1 Introduction 

Although classical results assure the existence of secure two- and multi-party 
protocols for any functionality in the presence of malicious adversaries, the com¬ 
putational overhead is often prohibitively large in practice. Hence, cryptogra¬ 
phers have sought more restricted models for malicious behavior, which are still 
realistic but facilitate more efficient protocol construction. A model of covert 
adversaries [2] proposed by Aumann and Lindell considers a setting, where cor¬ 
rupted parties are unwilling to deviate from the protocol unless they remain 
uncaught. More precisely, they defined a hierarchy of security models, where 
malicious behavior that alters the outputs of honest parties is detectable with 
high probability. However, none of these models guarantee input-privacy because 
a malicious adversary might potentially issue a detectable attack that completely 
reveals inputs of all honest parties. We extend their hierarchy with a new security 
model (consistent computing ), which guarantees that malicious participants can¬ 
not learn anything beyond their intended outputs and honest participants can 

P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 88- |l06,| 2010. 
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Table 1. Comparison of various security objectives in a malicious model 


Objective 

Input-privacy 

Output-privacy 

Complaint handling 

Detectability 

Multi-party protocols 

Security 

Yes 

Yes 

Secure 

Optional 

Consistency 

Limited leaks 

Limited leaks 

Possible 

Optional 

K-leakage 

Limited leaks 

Limited leaks 

Possible 

No 

Covert Model 

No 

No 

Impossible 

Partial 

Privacy 

No 

No 

Impossible 

No 

Two-party protocols 

Security 

Yes 

Yes 

Secure 

Yes 

Consistency 

Yes 

Yes 

Possible 

Yes 

K-leakage 

Limited leaks 

Limited leaks 

Possible 

No 

Covert Model 

No 

No 

Impossible 

Yes 

Privacy 

No 

No 

Impossible 

No 


detect malicious behavior that alters their outputs. As a result, a valid corrup¬ 
tion complaint leaks only a single bit of information about the inputs of honest 
parties as opposed to the complete disclosure. Moreover, an honest participant 
can often decide whether to complain or not. If a complaint is not filed, then no 
information will be leaked at all unless the adversary learns it indirectly. 

Our security model also guarantees that no participant can change their in¬ 
put during a multi-round protocol, which consists of many sub-protocols, i.e., 
there exists an input that is consistent with all outputs. Additionally, the client 
can prove cheating to third parties without active participation from the server, 
since the protocol failure together with a proof that shows correctness of client’s 
actions is sufficient. Hence, our security model is sufficient for many client-server 
applications, where a server’s long-term reputation is more valuable than infor¬ 
mation revealed by corruption complaints. 

Finally, note that the ability to detect cheating from legitimate protocol fail¬ 
ures can be important, as well. A good example is private inference control [31] . 
where the client makes queries to the server’s database. To protect the server’s 
privacy, certain query patterns are known to be forbidden and should be rejected, 
though without the server necessarily getting to know which one of the “forbid¬ 
den” query patterns was used. Hence, a client really needs to know whether the 
query failed due to insufficient privileges or the server just cheated. 

Our Contributions. Our main contribution is the new security model, which 
provides more strict security guarantees than the semihonest model, all flavors 
of covert models [2] , and the ^-leakage model [12] as depicted in Table [1] 

We also present concrete, efficient protocols for consistent adaptive oblivi¬ 
ous transfer and consistent conditional disclosure of secrets. Notably, all our 
constructions are much more efficient than their fully secure counterparts. For 
instance, the new consistent oblivious transfer protocol is secure against un¬ 
bounded malicious clients, uses 2 messages per query, and has communication 
and computation comparable to that of the underlying private oblivious trans¬ 
fer protocol. As a main technical tool, we use list commitment schemes, which 
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allow to commit to a list of elements so that, given a short certificate, one can 
later verify the value of a single element of the committed list. Besides conven¬ 
tional hiding and binding properties, we need equivocality and extractability. 
See Sect. [3] for details and constructions. 

Notation. Throughout this paper, k denotes the security parameter, {Ak} is 
a shorthand for a non-uniform adversary. The shorthand t(k) £ poly(fc) denotes 
that t(k) can be bounded by a polynomial and e(k) £ negl(fc) means that e(k) 
decreases asymptotically faster than any reciprocal of a polynomial. 

Full Version and History. Full version of this paper can be found at [20] ■ The 
first version of this eprint from the March of 2006 already defines consistency 
(although under a different name). The 2-message argument system from [T%1 
was influenced by the first version of the eprint. 


2 Definition of Consistent Computations 

Achieving security against malicious behavior usually involves a large computa¬ 
tional overhead, since one must provide a universal fraud detection mechanism 
such that honest parties can detect a fraud even if it does not affect their con¬ 
crete private outputs. As a possible trade-off between efficiency and security, we 
could protect honest parties only against such actions that alter their outputs. 
As a result, malicious adversaries might still cause selective protocol failures, 
where honest parties fail if their inputs are in a specific range. In the following, 
we use the standard ideal versus real world paradigm to formalize this concept 
of consistent computations for various protocols. Note that we use standard se¬ 
curity definitions |8I18| with modified ideal world implementations, which give 
additional power to the adversary, see Fig. [T| as an example. 

For clarity and brevity, we present the definitions without delving into subtle 
technical issues. In particular, we have omitted all low-level details of the ideal 
and real world executions, as these are thoroughly discussed in common reference 
materials |8|18j . Other more model specific details are separately discussed at 
the end of the section. 

Vi TTP P 2 

_ XI _ _ X2 _ 

< Ad 

7r(xi,X2) = 0 

V2 °r -L 

y i Abort or Proceed 


Fig. 1. Ideal world model for consistent two-party computations. A corrupted partic¬ 
ipant V 2 can cause selective halting by specifying a predicate 7r(-). In the standard 
model, the dominant party V 2 can cause only a premature abortion. 
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Idealized Implementations. In an idealized two-party protocol corresponding 
to consistent computing, both parties send their inputs x±,X 2 to the trusted 
third party TTP, which computes the corresponding outputs yi,y 2 - Next, a 
corrupted participant sends the description of a randomized halting predicate 
7 r(*) to TTP, who internally computes ^(£ 1 , 2 : 2 ). If n{xi, X 2 ) = 1, then TTP halts 
the computations and sends _L to the honest participant. If 7 r(xi, X 2 ) = 0, then 
TTP sends back the outputs yi exactly the same way as in the standard ideal 
model. In particular, the corrupted party can still cause a premature abortion 
and thus still learn its output. 

Generalization to the multi-party setting is straightforward. However, there 
are two subtle issues connected with fairness and detectability. A protocol guar¬ 
antees fair selective abortion if an adversary can specify only a single predicate 
7 r(*) such that TTP halts the computations and sends _L to all participants iff 
7 r(a;i,... ,x n ) = 1. Alternatively, corrupted participants can separately specify 
different halting predicates 7 r* (-) for each party Vi and thus some parties might 
get their outputs while others do not. Also, note that the identity of the mali¬ 
cious coalition might remain hidden for multi-party protocols, whereas this can¬ 
not happen in a two-party protocol. A consistent protocol provides detectability 
if TTP sends _L to Vi together with the identity of a corrupted participant who 
specified the halting predicate whenever 7 Tj(a;i,.. ., x n ) = 1 . 

Consistency can also be formalized for adaptive computations, where the out¬ 
puts of each round can depend on the inputs submitted in previous rounds. 
For the sake of brevity, we define consistency only for client-server protocols, 
where the server initially commits to his or her input, and after that the client 
can issue various oblivious queries. This model covers many practical settings 
such as selling digital goods and private inference control pm To start such 
a protocol, a server sends his or her input x to TTP. After that the client(s) 
can adaptively issue various queries qi to TTP. When a query < 7 ,; arrives, TTP 
sends a notification message to the server who can then specify a description of 
a halting predicate 7 q(xi,..., xf). Next, TTP evaluates the predicate and sends 
f(qi,x ) back to the client if 7 r,;(gi,.. ■ ,qt) =0, otherwise the client receives _L. 
As a small subtlety, note that the ability to issue halting predicates one-by-one 
is needed only in the adaptive corruption model, where there are many clients 
and the server might become corrupted in the middle of computations. 

Formal Security Definitin. As usual, we define security of a protocol by com¬ 
paring its output distribution in the standalone setting with the corresponding 
output distribution in the ideal world. More formally, fix a security parameter k 
and let Dk denote the input distribution of all parties including the adversary Ak- 
W.l.o.g. we assume that each input is a pair (tf>i,Xi), where the auxiliary input 
4>i models the internal state of the participant before the protocol and Xi is the 
actual protocol input. Now, if we fix the exact details how protocols are executed 
and what the plausible attacks are, then a protocol instance and an adversary 
Ak together determine uniquely a joint output distribution R.EALj) fc (Ak, n^) of 
all parties including Ak■ Let Ideal k (A° k , n£) denote the joint output distribu¬ 
tion determined by the ideal world adversary and the corresponding ideal world 


92 


S. Laur and H. Lipmaa 


implementation II£. We say that the protocol family {life} securely implements 
{n°} if for any non-uniform polynomial-time adversary {Afc} there exists a non- 
uniform polynomial-time adversary {.4£} such that for any input distribution 
family {£)&}, the output distributions Real^*. (A*, n^) and Ideal®*. (A£, n£) 
are computationally indistinguishable. If the output distributions are statisti¬ 
cally indistinguishable or coincide, then we can talk about statistical and perfect 
security. Finally, a protocol family {life} correctly implements {n£} if for any 
input distribution family {£)&} the output distributions coincide provided that 
the adversaries remain inactive (corrupt nobody) in both worlds. 

Basic Properties. Note that the only difference between the formal defini¬ 
tions of consistency and security in the malicious model is in the description of 
the ideal world execution. Hence, we can treat a consistent protocol as a secure 
implementation of a modified functionality that allows explicit specification of 
halting predicates. As a result, standard composability results carry over and 
each consistent protocol in a sequential composition can be replaced with the 
ideal implementation. However, the resulting hybrid protocol does not necessar¬ 
ily correspond to a consistent ideal world execution. For instance, if we execute 
two client-server protocols in a row, then the server’s input is not guaranteed to 
be the same for both ideal implementations. Also, a malicious server can specify 
two halting predicates instead of a single one. 

The main advantage of consistent computations over other weakened security 
models is an explicit correctness guarantee. By the construction of the idealized 
model of consistent computations, an honest participant reaches the accepting 
state iff his or her output is consistent with the inputs submitted in the beginning 
of the protocol. Hence, a successful protocol run provides consistency guarantees 
in the real world, as well. Consequently, a non-accepting honest participant can 
prove without the help of other participants that a malicious attack was car¬ 
ried out. Moreover, any consistent protocol can be augmented with a complaint 
handling mechanism that reveals nothing beyond the validity of the complaint. 

Theorem 1. Let a protocol family {n^.} be a correct and consistent implemen¬ 
tation of a functionality {n£} such that all messages are signed by their creators. 
Then an honest participant can prove the existence of a malicious attack that al¬ 
ters his or her output without help from others provided that the signature scheme 
is secure. This proof can be converted to a zero-knowledge proof if the messages 
received by the honest participant reveal nothing about his or her input. 

Proof (Sketch). For the proof, note that by our security assumptions no partic¬ 
ipant can forge messages sent by others. Hence, if an honest party reveals his or 
her input and randomness together with all received messages, then anybody can 
verify correctness of her computations. Since the protocol implements correctly 
{n£}, semi-honestly behaving participants cannot cause a non-accepting output. 
This proof can be converted to a zero-knowledge proof, since it is sufficient to 
present all received messages and then prove that there exists a valid input and 
randomness that leads to the non-accepting state. The corresponding statement 



On the Feasibility of Consistent Computations 


93 


belongs to an NP-language and thus has an efficient zero-knowledge proof. The 
claim follows as messages in the proof reveal nothing about his or her input. □ 

Note that the last assumption in Theorem [l] is not a real restriction and can be 
easily met by using a secure public key cryptosystem. Namely, if all messages 
are encrypted with public keys of corresponding recipients, then messages leak 
no information to outside observers but the protocol remains consistent. 

However, differently from secure computations, a valid complaint reveals addi¬ 
tional information, namely, the adversary learns that the corresponding halting 
predicate 7r,;(xi,..., x n ) holds. On the other hand, a honest party does not have 
to issue a complaint and thus the adversary is not guaranteed to learn halt¬ 
ing predicates—in some applications, the honest parties can untraceably recover 
from protocol failures. For all consistent and detectable protocols, such a com¬ 
plaint also reveals the identity of the maliciously behaving participant. Hence, 
there is a trade-off between the utility of a single bit ^(aq,..., x n ) and a long¬ 
term reputation of a participant. As a result, consistency of computations is 
an adequate protection mechanism for all settings, where participants are un¬ 
willing to cheat if they are caught with high probability or a single bit leakage 
is much smaller compared to the amount of information revealed by legitimate 
protocol outputs. For instance, the intended output of privacy-preserving data- 
aggregation is usually several kilobytes (if not megabytes) long, and therefore 
the effect of a single bit leakage is likely to be irrelevant. 

The same argumentation holds also for consistent protocols without account¬ 
ability. However, finding the identity of the culprit is difficult in such settings, 
because everybody must prove the correctness of their actions and the corre¬ 
sponding zero-knowledge proofs can be intractable in practice. Finally, note that 
the potential damage of a valid complaint depends on the set of possible halting 
predicates 7 q. In Section |6] we study this question explicitly and show how to 
restrict the class of enforceable predicates. 

Relation to Other Security Definitions. The concept of covert corruption is 
rather old and has been discussed in many contexts. The earliest definitions were 
given for the multi-party setting HU and only recently modified to work in two- 
party settings by Aumann and Lindell [2]. In particular, note that the definition 
of t-detectability given in [15] and various definitions of e-detectability given 
in [2] guarantee only that malicious behavior, which alters the outputs of honest 
parties, is detected with notable probability. However, none of the definitions 
limit the amount of information acquired during a successful fraud attempt. 
Thus, our definition of consistent computations is a natural strengthening of 
these definitions, which also guarantees the privacy of inputs. Another related 
security notion is the k-leakage model [25] , where the adversary can learn up 
to k bits of auxiliary information about the inputs of honest parties. Similarly 
to consistent computations, the adversary cannot alter outputs without being 
detected. However, differently from consistent computations, the information 
is guaranteed to reach adversary and such an attack is undetectable. Hence, 
the fc-leakage model provides less strict security guarantees. See Table [T] for a 
comprehensive summary. 
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Subtle Details. Note that halting predicates must be efficiently computable. 
Otherwise, participation in an idealized computation can provide significant 
gains to the adversary. Hence, we require that for any adversary {Ak}, the 
time needed to evaluate halting predicates is polynomial in the running-time 
of {A k }- 

Also, observe that many important cryptographic protocols are not secure 
in the strict sense. The problem starts with classical zero-knowledge proofs, for 
which, we know only how to construct simulators A° that work in expected poly¬ 
nomial time. However, a model where ideal world adversaries have expected poly¬ 
nomial running time causes many technical and philosophical drawbacks [lfij . 
For instance, we loose sequential composability guarantees. Hence, we use an 
alternative formalization. A protocol {n^,} is secure in a weak polynomial se¬ 
curity model, if for any time bound t(k ) G poly(fc), for any notable difference 
e(k) € Q(k~ c ), and for any polynomial-time real world adversary {Ma-}, there 
exists an polynomial-time ideal world adversary {A£} such that no non-uniform 
distinguisher {Bk} with running-time t(k) that can distinguish Realj^ {Ak, n^,) 
and iDEALjjj. {A° k , n£) with advantage more than e{k). This definition has the 
virtue of being formalized with strict time bounds and thus free of technical 
issues. In particular, it is sequentially composable and formalizes our knowledge 
about the reductions as precisely as possible. 

3 List Commitment Schemes 

To achieve consistency, a corrupted participant must be unable to change his or 
her input during the protocol without being caught. The latter can be achieved 
by forcing participants to commit to their inputs. More precisely, we need com¬ 
mitment schemes for lists of elements, such that individual elements can later 
be decommitted by presenting short certificates. A list commitment scheme is a 
quadruple of probabilistic polynomial-time algorithms (gen, com, cert, open) with 
the following semantics. The key-generator algorithm gen(l fc ) is used to generate 
public parameters ck that fix the message space A4k and the maximal number 
of list elements Nk G poly(fc). Given a list x = (x\,... ,x n ) G A4 n with n < Nk, 
the commitment algorithm com c k(a:) outputs a pair (c, d) of commitment and 
decommitment values. The certificate generation algorithm cert c k(d, i) returns 
a partial decommitment value ( certificate ) s t for the ith element. The verifi¬ 
cation algorithm open ck (c, s) returns either a pair ( i,Xi ) or _L. It is required 
that open ck (c, cert c k(d, *)) = (i,Xi) for all possible values of ck <— gen(l fe ) and 
(c, d) <— com c k(x). We now define various (optional) security properties through 
games that are played between a challenger and a nonuniform adversary. 

Binding and Hiding. A list commitment scheme is computationally binding if 
every polynomial-time adversary {Ak} wins the following game with negligible 
probability: 
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1. Challenger generates ck <— gen(l fe ) and sends ck to Ak- 

2. Ak generates a commitment c and two certificates 'sq and Si ■ 

3. Ak wins if the commitment can be opened to different values of 

That is, locations coincide io = i\ but 1 / io / ii / 1 for the 
openings (*o,a-’o) *— open ck (c, So) and <— open ck (c, Si). 

A list commitment scheme is statistically hiding if for any non-uniform adversary 
A the probability that Ak wins the following game is negligibly close to one half: 

1 . Challenger generates ck <— gen(l fc ) and sends ck to Ak- 

2. Ak sends two lists a;^°\a;d) g A4 n with n < Nk to the challenger. 

3. Challenger generates a random bit b <— {0,1}, computes 

(c, d) <— com ck (a; * l 2 ' ft )) and sends the commitment value c to Ak- 

4. In the next phase, Ak can make a number of oracle queries to cert ck (d, •) 
provided that xJf ^ = x^ for any queried index i. 

5. Ak wins the game if he or she correctly guesses the bit b. 


Equivocality. In several proofs, we use simulators that send a fake commit¬ 
ment value c to the adversary and then gradually open parts of it according to 
the instructions sent by TTP. To preserve the closeness of real and simulated 
executions in such a setting, the commitment scheme must be equivocal. A list 
commitment scheme Ic is perfectly equivocal if there exist three additional al¬ 
gorithms gen°, corn 0 and equiv, such that no unbounded adversary {Aa,} can 
distinguish between the following two experiments: 


Normal Execution: 

1. Challenger generates ck <— gen(l fc ) and sends ck to Ak- 

2. Ak sends x = (x\,... , x n ) to the oracle O who computes 

(c, d) <— com ck (a:), s, <— cert ck ((i, i) and replies with (c, Si,..., s n ). 

Simulated Execution: 

1. Challenger generates (ek,ck) <— gen°(l fe ) and sends ck to Ak- 

2. The oracle O computes (c, rf) <— com° k (n) and, given x = (aci,..., x n ) 
from Ak, computes s, <— equiv ek (c, rp i, xf) and replies with (c, sf,... ,s n ). 


One can build non-interactive equivocal commitment schemes based on any 
one-way functions in the common reference string (CRS) model [12] , In the 
standard model, 3 messages are needed to implement an equivocal commitment 
scheme. Thus, all subsequent results that use equivocal commitment schemes 
require at least 3 messages. However, as the initialization phase can be shared 
between different runs, the round complexity is not a problem in practice. 

Extractability. Many commitment schemes have an explicit extraction mech¬ 
anism such that a person who possesses some extra information sk can open 
commitments without decommitment value, see for instance H0. These com¬ 
mitments are often used in simulator constructions, where one has to extract 
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inputs for committed values. For obvious reasons, such trapdoors do not exist 
when the final commitment value is shorter than the length of a committed 
string. 

Buldas and Laur showed that if the creator of a commitment gets no addi¬ 
tional information besides the commitment parameters ck, then all committed 
elements are efficiently extractable given white-box access to the committing 
algorithm and to the randomness used by it. See the definition of knowledge¬ 
binding and corresponding proofs in [5]. However, in the context of two- and 
multi-party computations, an adversary always gets additional inputs and thus 
we must amend the definition. A list commitment scheme is white-box extractable 
if for any polynomial-time adversary {Ak} there exists a polynomial-time extrac¬ 
tor machine {/Ca*,} such that for any input distribution ®k and for any family 
of advice strings {ak} the adversary Ak can win the following game with negligi¬ 
ble probability. The family of advice strings {ak} in the game models unknown 
future events, which might help the adversary to open the commitments. 


1. Challenger generates ck <— gen(l fe ), 4> <— ®k and a new random tape u>. 

2. Ak gets cj) and ck as inputs and to as the random tape and outputs 
a list commitment c together with size n, (c, n) <— Ak{({>, ck; to). 

3- l^-Aic gets 4> , ck and to as inputs and outputs (xi ,..., x n ) <— K-A k {4>, ck; to). 

4. Given advice a*,, Ak outputs certificates (si,..., s m ) <— Ak(ak)- 

5. The adversary wins if A/- outputs at least one certificate that is consistent 
with the commitment and that corresponds to a list element, not correctly 
guessed by the extractor, i.e., if 3j : ± ^ (i, x*) = open ck (c, Sj) Ai*/ Xi- 


Currently it is not know how to construct a non-interactive compressing com¬ 
mitment scheme that is provably white-box extractable 0 However, if we consider 
interactive commitment schemes, where a sender executes a zero-knowledge proof 
of knowledge that he or she knows how to open all elements under the list com¬ 
mitment, we can construct such a knowledge extractor by definition. By using 
suitable zero-knowledge techniques as detailed in [24] , the total communication 
between the receiver and the sender can be made sublinear, although the com¬ 
putational overhead might be too large for practical applications. 

As the security of proofs of knowledge is often defined in a weaker model [5], 
we also relax other definitions to be compatible. A list commitment scheme is 
weakly white-box extractable if for any polynomial-time adversary {Ak} and an 
error bound e(fc), there exists a extractor machine {} such that, for any input 
distribution ® k and for any family of advice strings {ak}, the adversary Ak wins 
the extractability game with a probability at most e(fc) and the running-time of 
K.A k is at most 0(poly(fc)/e(fc)) times slower than Ak- 

Double-Layered Commitments. There are two principally different ways 
how to construct a list commitment scheme with extractability and equivo¬ 
cality properties. First, one can commit elements individually using ordinary 


1 The results of [5] assure existence of extractors K.A K ,v k that depend on ®k- 
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commitment scheme with these properties, such as pd]. As a result, we get 
strong extractability guarantees but cannot get beyond linear communication 
complexity. Alternatively, we can first build a double-layered equivocal commit¬ 
ment scheme, and then add extractability by an interactive proof of knowledge. 
A double-layered commitment scheme die is specified by a conventional commit¬ 
ment scheme cs and a list commitment scheme Ic. The key-generator procedure 
die.gen runs both key generation procedures and outputs a pair of resulting pub¬ 
lic parameters (cki,ck 2 ). To commit to x = (xi,..., x n ), one first computes 
conventional commitments ( Ci,di ) <— cs.com ckl (a;*) for i £ {1,... ,n} and then 
outputs (c*, df) <— lc.com c k 2 (ci, ..., c n ). To decommit x,; one has to first decom¬ 
mit Ci by giving lc.cert ck2 (d*, i) and then also reveal dt so that the receiver could 
compute es.open cki (cj, di). Other operations are defined analogously. 

Theorem 2. Let Ic be a binding commitment scheme and cs be a conventional 
commitment scheme. Then die inherits statistical hiding, perfect hiding; compu¬ 
tational binding; statistical equivocality and perfect equivocality from cs. 

Proof. Hiding and binding are evident. For the equivocality, note that given the 
equivocation key ek for cs, it is possible to use cs.com 0 to generate a list c\ ,..., c n 
of fake commitments for lower level that can be later opened to any values using 
the function cs.equiv ek and the claim follows. □ 

The list commitment scheme does not have to be hiding. Hence, we can use 
hash trees to compress large lists into succinct digests. The corresponding con¬ 
struction is based on a collision-resistant hash function family {Tlk} and the 
length of certificates is known to be of size O(fclogn). The Pedersen commit¬ 
ment scheme [27] is a good candidate of the conventional commitment scheme, 
as it is perfectly equivocal in the CRS model and can be easily set up in the stan¬ 
dard model. More precisely, the public parameter is a uniformly chosen group 
element y £ (g) and the corresponding equivocality trapdoor is the discrete log 
of y. As the first option, parameters can be generated jointly by the sender and 
the receiver by using a secure three-message multiplication protocol to multi¬ 
ply two random group elements. Alternatively, the client may specify y since 
the Pedersen commitment scheme is perfectly hiding. Then, we lose equivocality 
unless we are willing to find the discrete logarithm of y in exponential time. 


4 Consistent Adaptive Oblivious Transfer 

Oblivious transfer protocols are often used as building blocks for complex pro¬ 
tocols. In an adaptive oblivious transfer protocol , a server has an input database 
x = (xi,..., x n ) of Obit strings and a client can adaptively query up to to ele¬ 
ments from this database. The client should learn nothing beyond x qi ,... ,x qm 
and the the server should learn nothing. In particular, the client should learn 
_L if its query is not in the range {1,..., n}. In the asymptotic setting, all pa¬ 
rameters to, n, i must be polynomial in the security parameter k. Two standard 
security notions for the oblivious transfer protocol in the malicious model are 
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Client’s inputs: adaptively chosen indexes qi ,..., q m . 

Server’s inputs: a database x = (xi ,..., x„). 

Common inputs: a description of Ic and ot. 

Trusted setup 

If needed, the trusted dealer executes the shared setup phase for ot. 

The trusted dealer broadcasts public parameters ck <— lc.gen(l fe ) to everybody. 

Commitment phase 

The server computes (c, d) <— lc.com c k(a;) and sends the commitment (c, n) to 
the client. Then the server computes Si <— lc.cert c k(d, i ) for each i £ {1,..., n}, 
and stores the database of partial decommitment values s *— (s 1 ,..., s„). 
Query phase. To fetch the g;th element form the database: 

1. The client sends Qi <— ot.query(gi) to the server. 

2. The server returns Ri <— ot.reply(s, Qi). 

3. The client computes Ai <— ot.decode)^, Ri) and (j,x*) *— lc.open ck (c, Ai). 

_ If j = qj then the client outputs x «, otherwise the client outputs T. _ 

Protocol 1. The new consistent adaptive oblivious transfer protocol 


security and privacy. In brief, private protocols guarantee only that a malicious 
client cannot learn anything beyond x qi ,..., x qm but do not assure that an hon¬ 
est client indeed learns x qi ,..., x qm if the server is malicious. As such they are 
inapplicable for many practical applications. 

In most adaptive oblivious transfer protocols that are secure in the malicious 
model, the server first commits to his or her individual database elements, and 
then at every query helps the client to “decrypt” a single database element, see 
for example . A natural alternative is to use a sublinear-length commitment 
scheme together with suitable zero-knowledge techniques as detailed in [21] , 
However, the resulting low-communication protocol is only a theoretical solution 
with computational overhead that is too large for practical applications. 

As a practical solution, we show how to convert any private oblivious transfer 
protocol into a consistent protocol with low computational and communicational 
overhead, see Prot.lH By using protocols |17I22] for oblivious transfer, we get an 
efficient protocol with almost optimal communication. For the sake of simplicity, 
we assume that the underlying private 1-out-of-n oblivious transfer protocol ot has 
2 moves and is determined by a triple of algorithms (query, reply, decode) such that 
for any qi £ {1, ..., n} and x £ {0,1} xn , we have decode^, reply (a:, query (qi))) = 
x qi . This assumption is not a big restriction, since most practical oblivious transfer 
protocols are in this form, and generalization to multi-round protocols is obvious. 
As a second simplification, we use a trusted setup phase for generating the public 
parameters. One can eliminate the need for a trusted dealer by running a secure 
multiparty protocol, but the explicit use of the trusted setup makes security proofs 
more modular. 

The underlying idea behind the protocol is rather simple. First, the server uses 
a list commitment scheme Ic to commit all inputs x. Then the server computes 
an intermediate database s = (si,..., s n ) of certificates corresponding to every 
Xj . In an query phase, the client and the server execute the oblivious transfer 
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protocol ot with the server’s input s to fetch the qjth certificate s qj . If this value 
opens a database element x* that is consistent with the commitment of x and 
the query q q , then we output x*, otherwise we return _L. 

Theorem 3. If the oblivious transfer protocol ot is computationally private in 
the shared setup model and the list commitment scheme Ic is binding and equiv¬ 
ocal, then Prot. [7] is a consistent adaptive m-out-of-n oblivious transfer protocol 
in the polynomial security model provided that n m £ poly(fc). 

Proof. For the proof, we fix a security parameter k , consider an adversary Ak 
and show how to convert it into an ideal world adversary A k such that the output 
distributions are close enough for any input pair (</> c , cf > s ). 

Security of HONEST CLIENT. Let Ak be a corrupted server and Ck an hon¬ 
est client. As the number of potential queries is polynomial, we can construct a 
black-box extractor K, Ak ,c that fixes random coins of the client and the malicious 
server, and makes all n m queries in order to recover all valid openings (j,xj). 
As the slowdown is polynomial and the commitment scheme is binding, double 
openings Xj ^ are revealed with negligible probability. Hence, we can use 
JC A *’ c in the construction of ideal world server. By the definition, the oblivious 
transfer protocol ot is private in the shared setup model if for any adaptively 
chosen inputs vectors q = (qi,..., q m ) and q = (q 1 ,... ,q m ) the output distri¬ 
bution of Ak is computationally indistinguishable. Hence, we can replace the 
missing messages in the ideal world by simulating the honest receiver with input 
q = (1,..., 1). We can combine these results and consider the following ideal 
world adversary A° k : 

1. Run the setup phase to obtain public parameters for Ic and ot. 

2. Choose randomness ui and store (xi ,..., x n ) <— /C' 4fe,Cfc (cj> B , ck; ui). 

3. Send (aq,..., x n ) to TTP and specify halting predicates 7Ti,..., 7r m through 
the interaction between the client Ck{q) and the adversary Ak{4>si ck; a;), that 
is, 7Tj(gi,..., qf) = 1 iff Ck with input qi,... ,qi obtains x qi ^ _L. 

4. Output whatever Ak(4>si ck; to) outputs in interaction with Ck(q). 

Let (ip c , ip s ) denote the outputs of the real execution and , f )°) the outputs 
of the ideal execution. W.l.o.g. we can assume that the output of Ak contains 
(f> s ,ck, to and thus given the advice <j> c we can efficiently compute if c form if s 
or ip°. Hence, the distributions (ip c ,if s ) an( i m ust be computationally 

indistinguishable, or otherwise we can distinguish xf s form which violates 
the privacy of ot. Now, note that for fixed ((f> s ,ck,u)) the corresponding outputs 
ifc and %j)° can differ only if the client recovers x qj ^ x qj . As this can hap¬ 
pen with negligible probability, the distributions ip°) and (^>°,^>°) must be 
computationally indistinguishable and thus also (i/i c , i/j s ) and (ip °, tp °). 
Security of honest server. Since the output of the server in the ideal and 
real model is empty, only the output of a malicious client Ak must be analyzed. 
Consider a hybrid implementation of the protocol, where all instances of ot 
are replaced with ideal implementations of oblivious transfer protocol with the 
database s. Then as the ot protocol is private in the shared setup model, there 
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exists an adversary A k such that the output distributions of Ak and A* k are 
computationally indistinguishable. 

To complete the proof, we construct a true ideal world adversary A k and show 
that the outputs of A° k and A)1 are computationally indistinguishable. Indeed, 
let the ideal world adversary A° k proceed as follows: 

1. Generate the equivocality key (ek, ck) «— lc.gen°(l /c ) and broadcast ck. 

2 . Compute (c, 77 ) <— lc.com° k ek (n) and send c to the adversary A* k . 

3. If A* k queries qj, obtain x qj from TTP and reply Sj <— lc.equiv ek (c, 77 , qj,x qj ). 

4. Return whatever the adversary A* k finally outputs. 

Then it is easy to see that in the hybrid world A* k plays the first equivocality 
game with the honest server and in the ideal world A k plays the second equivo¬ 
cality game with a challenger consisting of the simulator A k , TTP and the honest 
server. To nitpick, A* k does not query all faked decommitment values at once, 
but clearly we can write a wrapper that queries all decommitments and then 
gradually releases them to A* k . Thus, the outputs of A k and A° k must be com¬ 
putationally indistinguishable or otherwise A* k together with the distinguisher 
would break the equivocality property. □ 

Corollary 1. If ot is (weakly) statistically server-private and Ic is statistically 
equivocal, then Prot.(4\is (weakly) statistically server-private. 

Proof. If ot is statistically private, then for each Ak there exists poly(fc) times 
slower A* k such that the output distributions are statistically close. Weak statis¬ 
tical privacy guarantees only the existence of A* k without bounds on the running 
time. Both claims follow as A° k is only poly(fc) times slower than A k . □ 

The limitation that the number of potential queries must be polynomial in k 
seems to be essential for getting a low-communication solution with a small 
computational overhead. To bypass this restriction, we can either use list com¬ 
mitment schemes that are both extractable and equivocal. 

Corollary 2. If the oblivious transfer protocol ot is computationally private in 
the shared setup model and the list commitment scheme Ic is (weakly) white- 
box extractable and equivocal, then Prot. [4] is a consistent adaptive m-out-of-n 
oblivious transfer protocol in the (weak) polynomial security model. 

Proof. Note that the algorithm pair (Ak : Ck) can be treated as a compound 
adversary, which generates a list commitment (c, n) and then later opens m 
elements according to the advice a = (q±,... ,qm). As the commitment scheme 
is white-box extractable, there exists an extractor machine K,A k £ k that, given 
the parameters ck, the server’s input <p s and the random tape w, outputs a list 
of candidate elements x = (xi ,... ,Xn ) such that at the end of the execution Ck 
accepts x qj 7 ^ x qj with negligible probability. This extractor can be used in the 
simulator construction of Thm. [3] instead of K, Ak :Ck . 

Weak EXTRACTABILITY. The same construction is valid for a weakly extractable 
commitment scheme. However, in this case for any notable error bound e(k), we 
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can choose K,A k ,c k such that (ip c , ip°) and (ip°,ip° s ) are e(fc)-close. As (ip c , ip s ) and 
(ipc, ip°) are computationally indistinguishable, we can guarantee that, for a large 
enough k, distributions (ip c , ip s ) and (ip°, ip°) are computationally 2e(fc)-close. As 
the slowdown is 0(poly(fc)/e(fc)), we have established that for any notable error- 
bound e(k), we can construct a polynomial-time ideal world adversary, i.e., the 
correspondence {Ak} > { A £} is valid in the weak polynomial model. □ 

Comparison with Other Protocols. If n m is polynomial in k, then we can use 
very communication efficient list commitments that stretch the input O(fclogn). 
By combining it with the most efficient private oblivious transfer protocol E! 
we get a protocol with a communication complexity O (k ■ m log 2 n ). Moreover, if 
we neglect the setup, then for the amortized round complexity is two messages 
per query. The latter is significantly better than the communication complexity 
fl(mn) of the secure adaptive oblivious transfer protocols |1 111016125] relying 
on zero-knowledge proofs. With an explicit use of the PCP theorem one can 
achieve polylogarithmic communication (2Tj but this approach is only optimal 
in the asymptotic sense. 

As for the computational complexity, note that additional computational over¬ 
head (compared to private protocols) comes from the commitment phase. For a 
hash tree based list commitment scheme, this computational overhead is O(n) 
hashing and commitment operations. If the number of queries is bounded or 
n m £ poly(fc), then there are no additional costs besides computing the commit¬ 
ment. If the server must handle an unbounded number of queries, the server has 
to prove that he or she knows how to open the commitment. In a communica¬ 
tion inefficient version proof, the server sends all lower level commitment values 
ci,... ,c n to the client and proves knowledge of each decommitment value. The 
client first checks that the root of the Merkle tree is correct and then verifies indi¬ 
vidual proofs. Such zero-knowledge proofs are particularly efficient for Petersen 
commitments. Again the overhead is O(n) operations. By using suitable conver¬ 
sion methods (24] we can achieve polylogarithmic communication by increasing 
the computational overhead by a polynomial factor. Although the construction 
still relies on the PCP theorem, the underlying proof is much simpler —the server 
does not have to prove correctness in the query phases. 

Aumann and Lindell described a l-out-of-2 oblivious transfer protocol !.2], 
which is secure in the covert model. Although the resulting security guarantees 
are weaker than for the consistent protocol, see Table [U their protocol still has 
7 messages and a much higher communication complexity. To be fair, three of 
those messages are used to implement trusted setup for the private oblivious 
transfer but there are still 4 messages per query and a malicious sender can 
change its input during the protocol. 

5 Consistent Conditional Disclosure of Secrets 

Let q = (qi,... , q n ) denote the client’s vector of inputs and let a; be a secret pos¬ 
sessed by the server. Then conditional disclosure of secrets (CDS) for a predicate 
p is a protocol, where the client should learn 
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cds p (q, x ) 


x, if p{q) = 1 , 

_L, otherwise , 


and the server should learn nothing. CDS protocols are often used to convert 
client-server protocols secure in a semihonest model to protocols that preserve 
the privacy of inputs in the malicious model, see MU for the details. 

In the context of the current work, we are more interested in the direct appli¬ 
cation of CDS protocols. Namely, note that a CDS protocol provides a way to 
distribute a secret only if the client’s input satisfies certain condition, i.e., the 
client has credentials to access the secret. As an example, consider a video on 
demand service, where a client should obtain a key to a video stream only if his 
or her balance is non-negative: credit > 0. However, the server should be unable 
to tell the client’s exact balance. The CDS protocols described in mu consist 
of two moves and the client’s query consists of ciphertexts of q ±,..., q n . As the 
CDS protocols of say [2T] are based on an additively homomorphic cryptosys¬ 
tem, the server can do a limited amount of cryptocomputing to form ciphertexts 
that decrypt to the secret if the condition p is met. Thus, the client must of¬ 
ten send some additional encryptions of auxiliary inputs w \,..., w n to help the 
server, i.e., q = (credit, w\ ... ,w n ) for our example. Since the solutions |1I21| 
provide only privacy in the malicious model, it is difficult to prove that the server 
maliciously declines access and the server cannot easily refute false accusations. 

Now consider an extended CDS protocol, where the server first publicly com¬ 
mits to x and the CDS protocol is executed to recover the corresponding de- 
commitment value. As the proof of Thm. [3] and Cor. [2] directly generalizes, the 
resulting protocol is consistent under the same assumptions. If the set of plausi¬ 
ble client inputs is exponential, the exhaustive knowledge extraction technique 
from Thm. [3] becomes infeasible and the construction, where the server proves 
that he or she knows how to open the commitment is the only option. The latter 
is not a big problem, as many conventional commitment schemes have efficient 
proofs of knowledge for this. For instance, the equivocal Pedersen commitment 
scheme has this property. Also, note that the server does not have to prove 
knowledge of the decommitment value to everybody. It is sufficient, if the server 
proves it to a respected peer during an initialization phase. If we can guarantee 
that such auditors are semihonest, then we can further optimize the proof. 

Moreover, Thm. |T| assures that the client can prove that the server acts mali¬ 
ciously to third parties. As anybody can repeat the second phase of a CDS pro¬ 
tocol enlisted in MU with a different secret x, the corresponding honest-verifier 
zero-knowledge proof is very efficient. The complaining client has to reveal x to 
the prover and then additionally prove (in zero-knowledge if necessary) that the 
reply of the server is invalid. 

The ability to complain makes CDS protocols very appealing in TV or mil¬ 
itary broadcasts with complex access policy, where credentials are granted by 
giving out random keys. This problem is commonly known as private inference 
control [3Tj. In this setting, a server holds a database of private keys that are used 
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to encrypt various content, e.g., documents with different confidentiality levels. 
Clients have acquired different credentials and the server’s task is to release cor¬ 
rect keys. For security reasons, the server should not learn which documents are 
accessed by different clients. At the same time, the server should deny access 
for clients who do not have appropriate credentials. However, the client should 
be able to distinguish between denial of service attacks, where the server acts 
maliciously, and legitimate denials, where the client has no right to obtain a cor¬ 
responding key. Moreover, to make the service accountable against inside attacks 
the client should be able to prove to third parties that the denial is illegitimate. 

We emphasize that the proofs of knowledge can be skipped if it is possible 
to force the server to construct commitments of keys semi-honestly during the 
initialization phase either by organizational means or by auditing. As efficient 
CDS protocols exists for all NP /poly predicates [2Tj , we have established that 
accountable private inference control is possible. More importantly, the solution 
is really practical if a complaining client is willing to reveal his input. 


6 Discussion and Open Problems 

Both solutions for oblivious transfer and conditional disclosure of secrets are 
based on a simple principle: the server first creates a list of possible answers and 
commits to it. Since all answers are independent of each other and a client can 
verify that the answer is correct, the server has to prove only that he knows how 
to decommit and not that all answers are consistent with some server’s input. 
As soon as the answers must satisfy a certain constraint or the client cannot 
check whether he or she obtained a decommitment value for a correct answer, 
the construction of a consistent protocol becomes much more complicated. 

Nevertheless, any such protocol must give rise to a list commitment scheme. 
Indeed, we can view any client-server protocol for computing f(q , x ) as a com¬ 
pact commitment to a list with elements x q = f(q , x) where q takes all plausible 
values. For three-move protocols, the first message is the commitment and the 
second message together with the third corresponds to interactive opening pro¬ 
cedure. The second and third message can be compacted into a single decommit¬ 
ment value provided that a colluding client and server cannot fool third parties 
who know the first message. As the query should not leak information about 
other entries, construction of such commitment schemes with implicit correct¬ 
ness guarantees seems a highly non-trivial task. Hence, the question whether one 
can construct three-move consistent protocols for other tasks is an interesting 
question, which might shed a light on what type of restrictions are implicitly 
enforceable by the design of a list commitment scheme. 

Another open question is how much can be learned from the complaints 
and whether is it possible to limit this exposure. By the definition of consis¬ 
tency the complaint leaks an output of a polynomial-time randomized predi¬ 
cate. In practice, we can further restrict the set of enforceable predicates tt. For 
instance, one can force memoryless consistency in the oblivious transfer pro¬ 
tocol. Namely, a client-server protocol is memoryless-consistent if the halting 
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predicates 7Ti,..., 7r m are independent from previous queries, i.e., 7r,;(gi,...,«?*) = 
7r i(qi ) and the server cannot relate results of different queries. 

Theorem 4. Prot.^is memoryless consistent if no instantiations of ot protocols 
share random variables. 

Proof. Assume that an adversary A breaks the memoryless-consistent property 
of Prot. HI That is, it can force the client to abort iff a predicate TTi holds on 
client’s queries (qi ,..., qf), where 7 q is a non-trivial function of at least two 
different values q a and qb for a < b < i. Since the protocol is stateless then the 
adversary can play the role of the client in round b > a, to breach the privacy of 
the client in round a: given its knowledge of whether the client aborted in round 
6, it will have some advantage in guessing q a , given the value 7r i(q ai qb). □ 

Analogous results can be stated for protocols consisting of several CDS protocols. 
However, memoryless consistency has a certain cost. Many efficient protocols for 
oblivious transfer [3011122] and CDS [1121] are based on homomorphic encryp¬ 
tion. In these protocols, the trusted setup phase assures that the client indeed 
knows the secret key. This setup phase is replaced with a corresponding proof 
of knowledge in practice. Now, if each sub-protocol has a different key pair, the 
preprocessing phase becomes rather complex. Hence, it is beneficial to share the 
key among many protocol instances, see [21J for further details. 

By doing so we loose memoryless consistency and thus a natural question 
arises: can we still bound the set of enforceable halting predicates. As all of 
these protocols send the client input in an encrypted form to the server and the 
replies are also encryptions, it is easy to force affine predicates. Given a list of 
encryptions Enc(q , i),..., Enc(gf), the server can multiply all replies with 

Enc((giai H-b qiUi - j3)r) = (Enc(gi) ai • • • Encfe)^ Enc(-/3)) r 

for a random message space element r. As a result, the replies are unaltered when 
q\ot\ + ■ ■ ■ + qiCti = /3 and uniformly distributed otherwise. Consequently, the 
server can easily force halting predicates corresponding to affine combinations 
of received ciphertexts 7r,;(gi,..., qf) = [qiai + • • • + qtcti = (3\. By multiplying 
replies with several such ciphertexts, the server can also force conjunctions of 
such affine combinations. 

Note that these attacks are applicable for any additively homomorphic en¬ 
cryption scheme. Hence, one can ask whether this is a complete description 
of halting predicates or not. Of course, this question makes sense only for de¬ 
terministic predicates, as any client server interaction can be formalized as a 
randomized predicate. For all deterministic predicates, it is reasonable to com¬ 
pare the behavior of a concrete cryptosystem with its idealized counterpart that 
is implemented through encryption, decryption and ciphertext-addition oracles. 
We say that a homomorphic cryptosystem has special cryptocomputing proper¬ 
ties if the malicious server can force deterministic predicates that cannot be 
forced if the underlying cryptosystem is ideal. As there are cryptosystems that 
allow to cryptocompute quadratic polynomials [3] and even polynomials of any 
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length jTH], cryptosystems with special properties exist. However, in all of these 
cases these properties follow directly from the design of a cryptosystem. Thus, it 
is reasonable to assume that standard additively homomorphic cryptosystems, 
such as Paillier [2B], are without special properties and the set of enforceable 
predicates is limited to affine tests and their conjunctions. Any provable rejec¬ 
tion to this fact would be interesting by itself as it would advance the set of 
cryptocomputable predicates. 
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Abstract. A fundamental privacy problem in the client-server setting 
is the retrieval of a record from a database maintained by a server 
so that the computationally bounded server remains oblivious to the 
index of the record retrieved while the overall communication between 
the two parties is smaller than the database size. This problem has 
been extensively studied and is known as computationally private 
information retrieval (CPIR). In this work we consider a natural 
extension of this problem: a multi-query CPIR protocol allows a client 
to extract m records of a database containing n t -bit records. We give 
an information-theoretic lower bound on the communication of any 
multi-query information retrieval protocol. We then design an efficient 
non-trivial multi-query CPIR protocol that matches this lower bound. 
This means we settle the multi-query CPIR problem optimally up to a 
constant factor. 

Keywords: Computationally private information retrieval, multi-query 
CPIR, lower bound on communication. 


1 Introduction 

A (single-server) computationally-private information retrieval (CPIR) proto¬ 
col enables a client to query a database without revealing which data it is 
extracting. Several cryptographic techniques based on various computational 
hardness assumptions have been proposed for CPIR with sublinear communi¬ 
cation. In this paper, we go beyond the well-studied single-query case and in¬ 
vestigate communication-efficient CPIR protocols for the case where the client 
has multiple queries. A multi-query CPIR protocol allows a client to extract m 
records of a database containing n records of £ bits each. We give an information- 
theoretic lower bound on the communication and design a multi-query CPIR 
protocol that matches this lower bound (up to a constant factor). Our focus 
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in this paper is on the theory of multi-query CPIR giving an asymptotically 
communication-optimal multi-query CPIR protocol under a reasonable crypto¬ 
graphic assumption; we leave the tasks of optimizing the computational overhead 
and the constant factor gap between the upper and lower bounds on the com¬ 
munication as open problems. 

oblivious transfer (OT) or symmetric CPIR (SCPIR) protocol. A two-message 
SCPIR protocol is usually required to be secure in the sense of semisimulatability, 
first defined by Naor and Pinkas |T2]. 

1.1 Background 

Private Information Retrieval was introduced in [3j. Kushilevitz and Ostro¬ 
vsky [5] showed that it is possible to do CPIR with sublinear communication. 
Cachin, Micali and Stadler [2| gave the first CPIR-protocol for retrieving one bit 
out a database where the communication complexity is polylogarithmic in the 
database size. The communication-wise best single-query CPIR protocols up-to- 
date are by Gentry and Ramzan [5] and Lipmaa [TO] that allow the retrieval of 
an £-bit record from the database. 

Turning to our problem, there are three trivial solutions to m-query CPIR: 
One option is parallel repetition of a single-query CPIR. In the case of repeating 
Gentry and Ramzan’s CPIR [6] this would result in a communication of 0{m ■ 
log n + m ■ i + m ■ k), where k is a security parameter specifying the length of an 
RSA-modulus. As we will see this is not optimal. 

Another option is to use a single-query CPIR protocol to fetch one mi -bit ele¬ 
ment from an (^)-element database. As our lower bound shows, this solution has 
asymptotically optimal communication 0(rn ■ log 2 (n/?n) + m ■ £ + k) when com¬ 
bined with Gentry and Ramzan’s CPIR, but unfortunately increases the server’s 
computation to T2((^j), which for many choices of n and m is superpolynomial 
in the security parameter. 

A third option is to transmit the entire database to the client and is inefficient 
in terms of communication. 

Ishai, Kushilevitz, Ostrovsky and Sahai [7] proposed batch-codes for encoding 
a database over many separate blocks such that a client can extract to records 
by querying only a smaller number of records from each block. Our solution uses 
a related strategy and part of this paper consists of encoding the database in 
separate blocks that can be queried by separate smaller CPIR protocols. The 
batch-codes by Ishai, Kushilevitz, Ostrovsky and Sahai, however, do not apply 
directly to our problem. One reason is that their batch-codes are optimized with 
respect to keeping the total number of records in all the blocks low in order to 
keep the computational complexity low, whereas our solution actually uses an 
encoding where the total number of records in all the blocks becomes quite large. 
Another difference between the works is that they only consider the case where 
the database and the blocks use the same alphabet, for instance £-bit strings, 
while we in some instances will encode the database into blocks of records from 
a different alphabet. 
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1.2 Our Contribution 

We design a computationally efficient two-message multi-query CPIR protocol 
with Q(ml + in ■ log 2 (rz./777.) + k ) bits of communication, where k is a security 
parameter specifying the size of an RSA modulus. Server computation is dom¬ 
inated by 0(n£) group operations, where in both cases the constant in big-0 
is reasonably small. The client’s privacy is based on a variant of the ^-hiding 
assumption |2)6j . In our construction, we use a multi-query CPIR variant of 
Gentry and Ramzan’s single-query CPIR [5] that works for a restricted set of 
parameters (m, n, l). We present a reduction demonstrating that any multi-query 
CPIR protocol that works for a restricted set of parameters can be used as a 
building block to construct a communication-optimal CPIR protocol for any set 
of parameters. 

We also prove that any perfectly correct multi-query (non-private) informa¬ 
tion retrieval protocol has an information theoretical lower bound of I2(m • 
log 2 (n/m) + ml) bits of communication. Thus, our proposed multi-query CPIR 
has optimal communication complexity up to a constant factor. 

1.3 Challenges and Techniques 

Known techniques suffice for communication-optimal CPIR in the extreme cases, 
where the number of queries is very small or very large. If to = f2(n) the server 
can send the entire database to the client in the clear, giving a communication 
complexity of nl = 0(ml + m\og(n / m)) bits. If m = 0(1) we can invoke Gentry 
and Ramzan’s single-query CPIR to times in parallel to get a communication 
complexity of 0(1 + k) = 0(ml + m\og(n/m) + k ) bits. We are interested in 
finding a communication-efficient CPIR protocol for the case where m is in 
between the two extremes. Indeed, if m = o(n), then downloading the entire 
database at a cost of nl bits would be sub-optimal and when to = w( 1) simply 
repeating Gentry and Ramzan’s protocol has an additive overhead of 12 (mk) 
bits, which would make it a sub-optimal choice. 

A first step towards resolving this issue is Gentry and Ramzan’s observation [6] 
that while they focused on the single-query case, it is also possible to get a 
restricted multi-query CPIR protocol with their techniques. We will use such a 
restricted multi-query CPIR scheme as a building block in our construction. The 
restricted protocol is only communication-optimal for certain choices of (to, n, l) 
though. It encodes the queries as hidden prime-powers, however, when n grows, 
the size of these primes grows as well. When l = f?(log n) or m = 0(n e ) for 
a constant e > 0 this turns out not to be a problem, but when l is small and 
to = w(n e ) the increase of the prime size causes a loss of bandwidth of up to a 
factor log n. 

To eliminate the up to a factor log n overhead in the communication complex¬ 
ity we will encode the database in such a way that it can be split into smaller 
pieces that can be processed by the restricted multi-query CPIR protocol. One 
part of this encoding consists of dividing the database into smaller blocks that 
will be treated separately. With smaller blocks, we need smaller primes in the 
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Gentry-Ramzan CPIR to specify a particular index of a record and this improves 
the communication complexity. To spread the queries evenly on the blocks, we 
first let the client choose a random permutation of the database. To preserve 
the sublinear communication complexity, the client does this by sending the 
server a seed for a pseudorandom number generator from which the longer full 
permutation can be generated. 

Another part of our encoding is best explained by an example. Suppose we 
have a database of 4 one-bit records and the client wants retrieve two records. 
We can encode the database as a 6-record database containing 2-bit elements 
for each possible pair of queries (1, 2), (1, 3), (1,4), (2, 3), (2,4), (3,4) the client 
could have. This encoding increases the size of the database and the size of the 
records, but reduces the number of queries the client needs to make. When the 
client’s encoding of queries as primes permits the extraction of many bits at 
a time, then this encoding improves bandwidth since fewer queries are needed. 
Batch-codes [7] address a related encoding problem, however, as explained in 
the section on related work neither of their batch-codes suffice for minimizing 
communication in our scheme. 

With respect to the lower bound on communication, the challenge is that 
we must consider all possible multi-query CPIR protocols. Most known CPIR 
protocols consist of encoding the queries in a single message that is sent to the 
server, which information-theoretically leads to a lower bound of log 2 ((^)) = 
n{m • log (m/n)). Furthermore, obviously m l -bit strings cannot be communi¬ 
cated using less than mi bits. For these protocols it is therefore straightforward 
to get a lower bound of fi(rn-\og 2 (n/m)+mt) bits. However, the lower bound also 
needs to cover the case of multi-query CPIR protocols that work in a different 
way and may use more rounds. This makes proving the lower bound non-trivial; 
we do not know of prior work giving such a lower bound even in the single-query 
case. 


1.4 Roadmap 

In Sect. [21 we present the necessary preliminaries. In Sect. [21 we prove a lower 
bound for the communication complexity of multi-query CPIR (even when pri¬ 
vacy is not required). In Sect. [H we construct a basic restricted multi-query 
CPIR protocol based on Gentry and Ramzan’s work [6]. In Sect. [5] we design a 
new multi-query CPIR protocol for any parameter values. 

2 Preliminaries 

Notation. All our algorithms take as input a security parameter k. In the 
following we say a function / is negligible if f(k) = k~ u ^\ We write /(fc) ss g(k) 
if \f(k) — g(k)\ is negligible. We write (outc, out_o) <— (C(x), D(y)) if C on input 
x and D on input y output respectively outc and out^i after interacting with 
each other. 

Multi-query CPIR. Consider a database with records x\,...,x n £ {0,1} £ . 
Informally, a multi-query CPIR protocol is a protocol that allows a client to 


Multi-query CPIR with Constant Communication Rate 


111 


extract m different records xi L ,..., m from the database, without revealing 
which records it extracted. Formally, a multi-query CPIR protocol consists of 
two interactive polynomial time Turing machines C and D that we call respec¬ 
tively the client and the server. Both parties get as input a security parameter 
k written in unary and additional parameters in, n, £. The server takes as an 
input n elements xi,...,x n £ {0,1} £ . The client takes as an input a set of 
m different indexes ii,... ,i n £ {1,...,n}. (Note that since we are interested in 
minimal communication in the case of fixed m , we can assume that all in indexes 
are different.) The client and server interact, and in the end the client outputs 
2 /i,..., y m £ {0, l} e or a special failure symbol _L. ( C,D ) is a multi-query CPIR 
protocol if it satisfies the standard correctness and privacy properties as defined 
below. Intuitively, correctness means that in the case of an honest client and an 
honest server, the client always retrieves correct elements ,..., Xi m . Privacy is 
defined in the sense of indistinguishability: given two input tuples i°, i 1 chosen 
by a malicious server, the server should not be able to guess which of the two 
tuples the client actually uses. 

Definition 1 (Perfect correctness). A multi-query CPIR protocol ( C,D ) has 
perfect correctness if for any k, any m,n,£ = poly(fc) and any i = {ii,... ,i n ) 
and x = (xi ,..., x n ) with ij £ {l,...,n} and Xj £ {0,1}^, we have that if 
(outc,out£>) <— (C(l k , to, n, £, i), D{l k ,in, n, £,x)), then outc = (x^ ,..., Xi m ). 

Definition 2 (Computational privacy). A multi-query CPIR protocol has 
computational privacy if for all non-uniform polynomial time adversaries A we 
have 

1 

2 ’ 

where m, n, £ = poly(fc) and i 3 — {i\,..., i 3 m ) with 1 < i{ < ■ ■ ■ < i 3 m < n. 

3 Lower Bound on (m, n, £)-CPIR Communication 

Let the database contain n records of size l. An {in, n, £) information retrieval 
is a two-party protocol between a client and a server that enables the client 
to receive any in out of the n records. In this section we will establish a lower 
bound for any perfectly correct {in, n , £) information retrieval protocol, private 
or not. If the protocol consists of the user indicating the desired indices and the 
server sending the elements of those indices a straightforward lower bound of 
log 2 (^) + ml bits applies. Establishing that the same lower bound applies also 
in the general case requires more work. We show that fl{in ■ log {n/in) + in -1) is 
in fact the lower bound for any perfectly correct information retrieval protocol. 
The lower bound is information theoretical and holds even when the client and 
server are computationally unbounded. The lower bound assumes the protocol 
to have perfect correctness, so any choice of fixed random tapes also gives a 
perfectly correct protocol, which means it suffices to prove the lower bound for 


b <— { 0 , 1 }, {in, n, l, i°, i 1 , state) <— A{l k ), 

(outc, outyi) (C(l fc ,TO, n,£, i b ), A(state)) : out .4 = b 
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any fixed pair of random tapes. We will therefore in the following without loss 
of generality assume that the client and server are deterministic. 

Denote by X the set of all subsets of n elements of size to and by Y the 
set of all n-tuples over the alphabet E = {0,1}^. The output of any (to, n, t) 
information retrieval protocol belongs to the set Z of m-tuples over E. We denote 
by f : X x Y —> Z the output of the information retrieval. 

Any protocol computing / can be represented by a binary tree (cf. [S]) so that 
each internal node v is labeled by a function c v : X —► {0,1} or s„ :Y —>{0,1}. 
The root of the tree is the initial node of the protocol and an execution involves 
following a path from root to leaf according to the functions of the nodes. Note 
that the program of the client is determined by all the c„(-) functions where the 
program of the server is determined by all the s v (-) functions. For the purpose 
of obtaining the most general lower bound we make no assumptions on the 
complexity of these functions. Finally, each leaf holds a value z € Z = {0, l}^ m 
which is the output of the client. 

For any such protocol we define the equivalence relation between two inputs 
(aq, j/i) ~ (a? 2 , Z/ 2 ) if they lead the protocol to the same output leaf. For each leaf 
A there is a different equivalence class R\, and the set of all equivalence classes 
of ~ is thus parameterized by the set of all leaves. It holds that for any A, the 
set R\ is a combinatorial rectangle: (x\,y\) £ R\,{x 2 ,y 2 ) £ R\ implies that 
(aq, 3 / 2)1 (x 2 , 2 /i) £ R\. This follows from the fact that the leaves define unique 
paths from the root and in each node the path the protocol takes only depends 
on one of the two inputs. A fooling set F z for some z £ Z, on the other hand, 
is a subset of X x Y for which it holds that for any (aq,yi), (x 2 , y 2 ) £ F z with 
f(xi,yi) = f(x 2 ,y 2 ) = 2 we have that either f{x\,y 2 ) ± z, or f(x 2 ,yi) ± z. 
Fooling sets are useful for lower bounds as they can only be covered by as many /- 
monochromatic rectangles as their cardinality (a rectangle R is /-monochromatic 
iff 3 z : {x,y) £ R => f(x,y) = z). The number of monochromatic rectangles in 
turn yields a lower bound on the number of leaves in any protocol tree which 
then implies a lower bound on the tree’s height (which is equal to the total 
communication) [8J . In a nutshell, the number of leaves in the protocol tree for 
any protocol computing the function / must be at least l-^zl where F z is 

a fooling set for the output value z £ Z. 

Lemma 1. Fix n, to, t. Let z = ( 21 ,..., z m ) be such that {zi ,..., z m } C {0,1} C . 
Define L{z) := lexmin ({0,1}^ \ , where the function lexmin(A) denotes 

the lexicographically smallest element of the set of strings A. The set 

F z = {(I,yi,...,y n ) I I = C {1 ,...,n} 1 y ij = z^yv = L(z)} 

is a fooling set of size (^), where the indexes have the ranges j = 1,..., m, i' £ 
{1,..., n} \ {ii ,..., i m }, and 1 < i\ < ■ ■ ■ < i m < n. 

Proof. It is obvious that \F Z \ = and that any input (x,y) £ F, 

satisfies that f{x,y) = z. We next show that F z is a fooling set. Let 
(/; t/i,..., y n ), (/'; y' L ,..., y' n ) £ F z . Observe that it should be I ^ thus 
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there is at least one location in I that is not in I'. Without loss of general¬ 
ity, say i\ € / \/'. It follows that f(I;y[,... ,y' n ) = (z[,..., z' m ) ^ z since 
z[ = L{z) ^ z\. □ 

Observe that if 2 e > m then trivially {zi,..., z m } C {0,1} £ for any possible 
output tuple z £ Z = { 0, l} em , i.e., there would be 2 em possible outputs for 
which the lemma above applies. On the other hand, when 2 e < m the number of 
possible outputs for which the lemma applies is at least (2 e — l) m . This follows 
from the fact that there are at least that many m tuples that ommit a specific 
£-bitstring. While this lower bound can be made more tight it will be sufficient 
for our communication complexity argument. 

Theorem 1. Consider parameters n,m,£ with n > m. The communication 
complexity of any protocol solving the (to, n, £) information retrieval problem 
is f2(m ■ log 2 {ji/m) + m ■ £). 

Proof. Consider first the case £ > 1. There are at least ( 2 e — l) m possible outputs 
of the information retrieval protocol for which the corresponding fooling set 
has cardinality (^) according to lemma [D It follows that the communication 
complexity is lower-bounded by 


log 2 



• (2 e 



> 



+ TO- log 2 (2 f 


1) 


> 



+ m(£ — 1) . 


In order to obtain the asymptotic bound, we use the fact > t m for any 

t, to > 1 and by setting t = n/m we obtain the statement of the theorem. Next, 
consider the case 1 = 1 and 2 to < n. In this case, there are 2 choices where the 
fooling set has cardinality (™). It follows that the communication complexity 
would be at least log 2 (^) = J?(to • log(n/m)). Finally in case £ = 1 and 2 m > n 
trivially the communication is at least to bits, from which the statement of the 
theorem follows. □ 


4 Restricted Multi-query CPIR 


In Sect.El we will construct a multi-query CPIR with communication complexity 
0(m£ + to • log(n/m) -I- k), where k is a security parameter. As a building block, 
it will use a multi-query CPIR protocol (C, D) with communication complexity 
k. Since communication is bounded by k, such a CPIR cannot handle all choices 
of ( m,n ,£). What we will need from the building block is that it can be used 
whenever 


to < 


ak 

£ + log 2 n 


(1) 


for some constant 0 < a < 1. 
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In this section, we provide a construction of such a building block —that we 
call a restricted multi-query CPIR based on the previous single-query CPIR 
of Gentry and Ramzan [6| and their observation that it can be extended to 
the multi-query setting. The security of the restricted multi-query CPIR re¬ 
lies on the ^-hiding assumption by Cachin, Micali and Stadler [2] . It is a 2- 
message multi-query CPIR protocol, so we will describe it by three algorithms 
(Query, Response, Extract) that generate respectively the client’s query, the 
server’s response, and finally allow the client to extract the records from the 
response. 

In the following, 77 will be a deterministic polynomial-time algorithm that 
takes n as input and generates n (small) prime numbers. K will be a probabilistic 
polynomial time key generator that takes as input the security parameter 7 and 
an integer tv < 2 2ak with factors in the list generated by 77, where a is a constant 
parameter. On such an input it generates a triple (G, g , q), such that G is a group 
with efficiently computable operations, q is a positive integer, and g is an element 
of this group with ord(g) = tv q. We require that the description of G and group 
elements are at most 7/3 bits each and that K satisfies the following assumption: 

Definition 3 (Decision Subgroup Assumption). There exists some a £ 
(0,1) such that for all probabilistic polynomial-time A, 

1 

2 ’ 

where the adversary outputs positive integers tvq 1 tv\ < 2 2ak with factors in 77(n). 

As an example, we may choose N = PQ as a 7/3-bit RSA modulus, where 
P = 2tv r + 1 and Q = 2st + 1 and r, s, and t are large random positive integers, 
and select g as a random element of 7j* n that satisfies ord (g) = nq for some q 
with gcd(7r, q) = 1. When 77 generates a set pi,... ,p n where 2n < p\ < ■ ■ ■ < p n 
it is shown by Gentry and Ramzan [6] that the assumption above reduces to a 
variant of the ^-hiding assumptiorQ: 

1 

2 ’ 

where A outputs tvq and tv\ as described above, and RSAK outputs a 7/3-bit 
RSA-modulus TV. To avoid factorization attacks due to Coppersmith when a 
large factor of </>(7V) is known, in this instantiation, the parameter a should 
be appropriately selected. Specifically, due to the fact that when a factor 
of cf>(N ) that is larger than A 1 * * / 4 is known then it is possible to factor TV 
(Coppersmith |5I4| . cf. jl] and the related discussion in [BJ) we need to choose 

a < 1/25 . 

1 Strictly speaking Gentry and Ramzan only show this for tv being a prime power, 

however, their proof carries over without change to the more general case where tv 

can be a composite number. 


Pr 


b <— {0,1}, (l n , 7r 0 ,7Ti, state) <— A(l k ), N 
A(N, state) = b 


RSAK(l fc ,7r b ) 


Pr 


b *— {0,1}, (l n , 7r 0 ,7Ti,state) <— A( l fc ), (G, g , q) 
A(G , g , state) = b 


K(l k ^b) 
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Indeed, with this choice we have that 7 r < 2 2fe / 25 which is smaller than 
2(fc/3-i)/4 ^ TV 1 / 4 as long as k > 75. 

Given (77, 7v), we can construct a 2-message (to, n, 7)-CPIR following the pro¬ 
tocol of Gentry and Ramzan. This restricted (m, n, 7)-CPIR protocol works for 
choices of the parameters that satisfy Eq. ©: 

Query: Let pi,...,p n be the primes generated by 77. Let 7Ti,...,7r„ be 
the smallest prime powers of pi,--.,p n that are larger than 2 l , i.e., 

7 Ti = p|^/ log 2 p, l_ Let be different indexes of the elements the 

client wants to extract from the database. Define 7r = Iljli an d run 
(G,g,q) <— A'(l fe ,7r). Send (G, g) to the server, and store q for later use. 

Response: Given a database of Gbit elements Xi,... ,x n , use the Chinese 
remainder theorem to compute x' so x' = Xi mod 7 for 1 < i < n and 
send c — g x to the client. 

Extract: For each 1 < j < m, compute Cj = c q7V>/7Ti ^ and gj = g qn / ni i , and 
find Xi j so that Cj = gh 3 . Output ( x ^,..., Xi m ). 

Observe that the extraction step requires solving m instances of the discrete 
logarithm problem within the cyclic groups (gf) for j = 1 ,...,to, where the 
order of each such subgroup is 7 tv. . Given that 7r,. is a power of the prime p j., 
the extraction requires 0(in^/pf(l / log 2 pi)) steps using Giant-Step Baby-Step 
techniques when the user computes the Xi-s as is done in the Pohlig-Hellman al¬ 
gorithm [15) . The server’s computation consists of one 0(£n )-bit exponentiation 
as in |Hj. 

Now that we have given the protocol, let us explain the constraints on the 
parameters. By definition we have 1 < to, n, i = k°^ but we need more limiting 
constraints since we use only k bits of communication. Since 7 r,; are chosen as 
the smallest prime powers of pt that are larger than 2 e we have 7 t* < 2 t p n = 
2 ^+i°g 2 Pn_ This means n < 2 m ^+ log 2 p») so we have 7r < 2 2ak whenever 


2 ak 

m < - — - . 

" + l0g 2 Pn 

Using the constraints in the example given by Gentry and Ramzan based on 
RSA moduli, we may use 77 that generates the first n primes larger than 2 n. We 
can use the following crude bound on the primes 2n < p\ < ■ ■ ■ < p n < 2 nr for 
n > 2. For n > 2 we therefore can use the restricted multi-query CPIR protocol 
whenever Eq. ([TJ) holds. 

We observe that when n = 1 we do not need any security assumption, since 
the client only has one choice of index to query and therefore privacy is not a con¬ 
cern. We also remark that if the key generation algorithm has negligible failure 
probability, we still have computational privacy if the client reveals the indices 
ii,... ,i m on key generation failure. This means we can get perfect correctness 
in the CPIR. In conclusion, we have the following theorem: 

Theorem 2. If the decision subgroup assumption (definition holds for a 
constant 0 < a < 1, there exists a 2-message multi-query CPIR with perfect 
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correctness, computational privacy and k bits of communication for parameters 
( m,n,i ) satisfying Eq. Jl]). 

Proof. Follows from discussion above. □ 

DISCUSSION. Recall that by Eq. dH). k = l?(m.log 2 n + mi), thus the restricted 
protocol has communication J?(mlog 2 n + mi). It may seem that we achieve 
no gain over the m-times parallel repetition of Gentry-Ramzan’s CPIR protocol 
that has communication ©(to log 2 n + mi + mk) for some security parameter k. 
However, the gain is actually quite significant, especially when k logn. 

For example, consider the case l = 1. Then, m-times repetition of the Gentry- 
Ramzan protocol gives us a multi-query protocol with communication m-k. Now 
suppose that to = y/k and n = fc 2 / 3 . The number of bits used in the transcript 
is fc 3 / 2 . On the other hand, when we use the restricted multi-query protocol, 
y/k < 1+ ((,g k and thus we get a protocol with communication k. Thus for large 
values of to the protocol of this section outperforms the m-times repetition of 
Gentry-Ramzan’s single-query CPIR protocol. 

5 Communication-Optimal Perfectly Correct Multi-query 
CPIR 

Our optimal communication reduction of arbitrary multi-query CPIR to the 
restricted multi-query CPIR will use the restricted CPIR protocol from [B] de¬ 
scribed above and a pseudorandom number generator PRG. Our transforma¬ 
tion operates in four different modes depending on the choice of the parameters 
(to, ri, i). We examine these modes of operation in the following four subsections. 
The most challenging case is the one that to is relatively large but not as large 
as to enable the trivial protocol that sends the whole database to be a good 
solution. We start with the easier cases first. 

5.1 Multi-query (m,n, £)-C PIR for Constant n/m 

When n = 0{m) it is asymptotically communication-optimal to send the entire 
database to the client. For concreteness, we fix the implicit constant in the big-0 
notation to be 9 and send the entire database to the client whenever n < 9m. 
It is obvious this is a 1-message multi-query CPIR protocol that has perfect 
correctness and privacy, and optimal communication of ni < 9 mi = 0{ml) bits. 
In this case, the server does not do any computation except what is needed for 
the transmission of the database. 

5.2 Multi-query (m, n, £)-CPIR for Small m 

We will now give a simple extension of the restricted multi-query CPIR that is 
communication-optimal when to < k 2 / 3 and n > 9m. We do this by chopping the 
f-bit records into smaller pieces of size e. This gives us \t/e \ databases containing 
e-bit strings. We run the restricted multi-query CPIR protocol [C, D) to extract 
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to records in each of these databases. In order to do this we have to select the 
parameter e suitably so that the parameter restriction for the restricted CPIR 
is satisfied. 


1. Define e = min(£, [ak/m — log 2 nj). 

2. The server splits (aq,..., x„) into [7/e] databases {(xh,u • ■ •, Xh, n )}^h=i > 
where all Xh,i are e-bit strings and x t is the concatenation of 

^1 ,ii * • * 5 

3. The client and server run |7/e] restricted multi-query CPIR pro¬ 
tocols in parallel for h € {1,..., [7/e]}: (xh ,^, ■ ■ ■, Xh,i m ) <— 

(C( l fe , to, n,e,ii, ..., i m ), D( l k , to, n , e, xh, i, • •., Xh, n )}- 

4. The client computes x tl ,..., Xi m by concatenating the restricted multi¬ 
query CPIR outputs {(xh,i 1 , ■ ■ ■ >27t,im)}L=^ or eac h index. 

5. The client outputs (x u ,..., a u m ). 


The above protocol runs in the same number of rounds as the restricted multi¬ 
query CPIR protocol. If £ < [ak/m — log 2 nJ we just need one copy of the 
restricted protocol, so we get a communication complexity of k bits. If £ > 
[■ ak /to — log 2 nj we get a communication complexity of 


\£/(]ak/m— \og 2 n\y\ - k< 


\ 2 777 $ 

+ 1) k = -- + k = Oik + ml) 

/ a 


provided 


. ak . ak 

l--lo g2 n l> — 


The latter condition holds for large enough k, because m < k 2 / 3 and log 2 n = 
0( log 2 k) implies 

2 m(1 + log 2 n) 


asymptotically, which in turn implies 

ak ak 

-log 2 n - 1 > — . 

to 2 m 

Note that one can further optimize this protocol, since for each h the client’s 
uses the same indices and therefore may choose to use the same initial query 
every time. 

Lemma 2. The multi-query ( m,n,l)-CPIR protocol described above for to < 
fc 2 / 3 is correct and private under the assumption that the underlying restricted 
CPIR protocol satisfies these properties. Moreover, if the restricted CPIR protocol 
has perfect correctness the CPIR protocol above has perfect correctness as well. 

Proof. By the choice of e we guarantee that 
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as required by Eq. ©■ A hybrid argument shows that (perfect) correctness fol¬ 
lows from the (perfect) correctness of the restricted multi-query CPIR protocol. 
Another hybrid argument shows that if an adversary has advantage e in break¬ 
ing the privacy of the CPIR protocol, then we can break the restricted CPIR 
protocol with probability 

e e 

W\ > JM) 

where the last inequality holds for values of 

, > 2m (1 + log 2 n) 

~ a 


5.3 Multi-query (m, n, £)-CPIR for Large Values of m and 
t < log 2 (n/m) 

We will now consider the case, where 9m < n A k 2 / 3 < to A t < log 2 (n/m). Let 
M 6 ff be two parameters to be specified below. We split the database into 
blocks of size bd , and on each of these blocks we will use the restricted 
multi-query CPIR protocol. Note that if it happens that the clients’ queries are 
evenly distributed then we only need to extract an average of records from 
each of these blocks. 

To ensure the uniformity of its queries, the client will choose a seed s <— 
{0,1}^ for a pseudorandom number generator. From this pseudorandomness 
seed, the client and the server can generate a pseudorandom permutation of 
the n elements. From now on we can therefore assume that the client’s indices 
ii,... ,i m are randomly distributed. Still, we cannot expect that each block has 
exactly records that need to be extracted. We will therefore choose a = bm/n 
and extract 2 ad records from each block. We will choose 6, d such that ad is large 
enough to give us negligible probability that the pseudorandom permutation 
places more than 2 ad records in any single block. 

Recall that the restricted multi-query CPIR lets us extract 2 ad records from 
each block provided that, following Eq. JTJ), 

ak 

~ t+log 2 (bd) 

When t is small, for instance when l = 1, this means that we need k bits to 
extract 

ak ak 

~ t + log 2 {bd) < log 2 (bd) 

database bits, giving us a non-constant communication rate. 

We will get around this problem by using an encoding of the block that 
gives a more efficient utilization of the bandwidth. The encoding divides each 
block of size bd into d segments of b records. We then encode each segment by 
enumerating all possible combinations of a elements that can be drawn from 
this segment. This gives us a segment of (*) strings of length at. On average we 
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desire to extract two (at )-bit records from each of the d segments. In reality, the 
2 ad records we need to extract from the block are pseudorandomly distributed 
on the d segments, but by extracting 3d (a£)-bit strings from the d segments, 
we are guaranteed to cover any distribution of 2 ad records in the block. This is 
an immediate corollary of the following simple counting lemma: 


Lemma 3. Let a,b,d £ N and let Si, ..., Sd be disjoint sets with |jSj| = b. For 
any A C uf =1 Si with |A| = 2 ad, there exists a family of sets G\,... ,Gt such that 
(i) for each Gj there is some Si with Gj C Si, (ii) \Gj\ = a, (Hi) A C U* =1 Gj 
and (iv) t < 3d. 


Proof. Let A\,...,Ad be the partition of A across S\,...,Sd with \Ai\ = ai 
and l a i = 2ad. Each Ai can be covered by |~^-"| subsets of size a from Si. 
It follows that we can cover A with a number of sets that equals X^=ifirl — 
d + (Y)i ~i a,i)/a = 3d. 


In conclusion, on each block we use the restricted multi-query CPIR to extract 
3d out of d • ( b a ) possible (af')-bit strings. According to Eq. (QJ, we can use the 
restricted multi-query CPIR protocol to do this if we choose b, d such that 


3d < 


ak 

al + \og 2 {d( b a )) 


( 2 ) 


Let us now give the constraints we have on the choices of b , d and give a possible 
choice of variables that gives us optimal communication complexity: 

— We want ad = rnb/n ■ d to be so large that there is negligible probability of 
more than 2 ad records falling into the same block. 

— We need Eq. © in order to use the restricted multi-query CPIR protocol. 

— Finally, we want d ■ ( b ) to be polynomial in k so that the encoded database 
contains k elements and hence it is processed in polynomial time in k. 


We first use a Chernoff-bound on the probability that for any given bd block 
there will be more than 2 ad records that we want to extract. For a fixed bd 
block, the probability of more than 2 ad indices needing extraction is smaller 
than the probability of more than 2 ad indices ending up in the same block if 
we allow repetition. The latter probability is Pr[A > 2ad] where X is a random 
variable with X = ■_ 1 Xj and X\ ,... , Xf )( j are independent Bernoulli trials 

with probability p = m/n. By using a Chernoff bound we get 

Pr[X > 2ad] < e~ ad/3 . (3) 

For the latter condition to hold, we choose a = [~1 and b = a\n/m~\ 
giving 


( b ) < (e—)° < (e\—]) io)(n/U +1 < e logfc+1 • 2 los(n/m) ' ( ‘°s°P™> +1) = k° (1) . 
\aj a m 
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When including d = k°^ we will therefore have ad = so the server will 

run in polynomial time. We observe for future use that at the same time 

fb\ ,b sn r n _ i°8fc n. 

( ) > ( —) > [—] i o&{n/m) > m ax(fc, —) > k. 

\aj a m m 

As we will see the above constraints on the parameters will be sufficient to get an 
optimal communication complexity. Note that in order to get perfect correctness, 
the client can check whether indeed all blocks need extraction of at most 2 ad 
records. In the unlikely case this is not the case, the client can send the indices 
that it wants to extract in the clear. This latter protocol is obviously not private, 
but is only invoked with negligible probability. We have the following protocol 
construction: 


!• Set a = I"and b = a\n/m] and d = |min(f, • 

2. The client generates a seed s <— {0, l} fc for the pseudorandom generator 
and checks that ip = PRG(s) is a permutation of the indices so at most 
2 ad records need to be extracted from each block of size bd. 

3. In the unlikely event ip does place more than 2 ad records to be extracted in 
the same block, the client sends ii ,..., i m in clear to the server (encoded so 
it uses approximately log (”) bits of communication). The server responds 
with (x^,..., Xi m ), which the client outputs and halts. 

4. The client sends s to the server and the server permutes the indices ac¬ 
cording to ip = PRG(s). 

5. The server divides the database into blocks of bd consecutive records and 
encodes each block as a database consisting of d(^) records of length at 
such that each segment of (*) records contains all possible choices of a Gbit 
records from the corresponding segment of b records in the block. 

6. The client and the server run the restrict multi-query CPIR protocol (C, D) 
on the \n/bd ] encoded blocks of d(^) records to get 3 d (a£)-bit strings. This 
corresponds to extracting the up to 2 ad records from each of the original 
blocks. 

7. The client decodes the output and reverses the permutation of the indices 
to get the output (ay,,..., oy m ). 


First, the bound on the error probability given in Eq. [3] is neg¬ 
ligible as it is bounded by e ~ ad / 3 and it holds that ad = a ■ 
|"min(m/a, (afc/4)/(a£ + 21og (|()))] > k 2 / 3 since m > fc 2 / 3 and l < log(n/m) 
and log (*) =log (k°^). 

Regarding communication complexity, let us first compute it when 

ak /4 

at+2 log ( b a ) 


m 

— > 
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so d = \(ak/4)/(a£ + 2 log (*))]. We send the pseudorandom seed of length k 
and run the CPIR protocol \n/bd\ times for a total communication of 

( \n/bd\ + 1) k <7-7 + 2k < nk/(b ■ --——— -7—) + 2k 

bd ~ a £ + 21 og 2 (*) 

4n fb\, 20 n« , n 

=— ■ (a£ + 2 log 2 ) + 2 k< - —■ log 2 -b 2 k 

ab \aj a b m 

40 n 

<— • miog 2 -b 2k , 

a m 

where we have used that at + 21 og(£) < alog(n/m) + 21 og((e 6 /a)°) < 
alog(n/m) + 2alog(e[n/m]) < 5alog(n/m). 

Next, we look at the case d — [to/ a]. We have a communication complexity 
of 

, r ,, „ ,, , nk nka 

(\n/bd\ + 1 ) fc < — + 2 fc < 7 -b 2k < 4 k. 

bd bm 

Also, in the rare cases where the client ends up sending the indices in the clear 
we have a communication complexity of log (") + m-£ = 0(m ■ log 2 (n/m) + k). 

Lemma 4. The CPIR protocol for n > 9m, m > k 2 ^ 3 ,£ < log 2 (n/m) is correct 
and private. It has perfect correctness if the restricted multi-query CPIR protocol 
has perfect correctness. 

Proof. The protocol is perfectly correct because the restricted CPIR protocol 
is correct. We just need to verify that the restricted protocol can actually be 
applied, i.e., for sufficiently large k we have 

„ , cuk 

3d < -- 7 —. 

a£ + log 2 d ■ Q 

To see this holds, observe d < k because 


a/4 - 


at -b 2 log 2 


< 1 


which follows from a/4 < 1 and ai + 21og 2 (*) = 0(log 2 k ) (for sufficiently large 
k). From the choice of d in the protocol we now get 


d < 


nk/ 4 


nk /4 


nk /3 


at -b 2 log ( o ) I \ at -b log 2 d ■ ( o ) ' at + log 2 d ■ ( 0 ) 


where the second inequality follows from d < k < (*) (for sufficiently large choice 
of k.) 

With the choice of parameters we are guaranteed that the restricted multi¬ 
query CPIR of communication complexity k bits can be used on each block of size 
bd. An adversary with a probability of e of breaking the privacy of the protocol 
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can therefore be converted into an adversary that breaks the restricted multi¬ 
query CPIR with probability ^ n J bd -\ except for the negligible probability that the 
privacy breach is due to a bad pseudorandom seed. Similarly, a hybrid argument 
shows that the multi-query protocol is correct. When the pseudorandom seed 
is bad, we step down to a non-private but perfectly correct CPIR. Therefore, if 
the restricted multi-query CPIR has perfect correctness, then we have perfect 
correctness of our CPIR. □ 


5.4 Multi-query CPIR for £ > log 2 (n/m) 

The final case is where 9m < n A fc 2 / 3 < m A £ > log 2 {n/m). We split each 
database record into £' := [f/[log 2 (n/m)"|] records of length [log 2 (n/m)] bits 
each. We now need to extract t' ■ m out of £' ■ n records of length [log 2 (n/m)"|. 
Using the previous construction, we get a multi-query CPIR protocol that can 
do this with communication complexity O (j!'m ■ log 2 + k'j = 0{m£ + k). 

5.5 Summary: Communication-Optimal Multi-query CPIR 

Combining the four protocols, we get a communication-optimal multi-query 
CPIR: 


1. If n < 9m send the entire database to the client 

2. Else if m < k 2 / 3 use the CPIR protocol from Section l5~^l with communica¬ 
tion complexity 0(m£ + k) 

3. Else if £ < log 2 (n/m) use the CPIR protocol from Section PT7TT1 with com¬ 
munication complexity 0(m • log 2 (n/m) + k) 

4. Else if £ > log 2 (n/m) use the CPIR protocol from Section PT~T1 with com¬ 
munication complexity 0(m£ + k) 


For sufficiently large k this protocol works for all choices of ( m,n,£ ). The 
communication complexity is 0(m£ + m ■ log 2 (n/m) + k), which is optimal up 
to a constant for perfectly correct CPIR. As a corollary to the lemmas in this 
section, we get the following: 

Theorem 3. The CPIR protocol given above is correct and private. It has perfect 
correctness if the restricted multi-query CPIR protocol has perfect correctness. 
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Abstract. Recent research has shown that the single-user security of 
optimistic fair exchange cannot guarantee the multi-user security. This 
paper investigates the conditions under which the security of optimistic 
fair exchange in the single-user setting is preserved in the multi-user 
setting. We first introduce and define a property called “Strong 
Resolution-Ambiguity”. Then we prove that in the certified-key model, 
an optimistic fair exchange protocol is secure in the multi-user setting if 
it is secure in the single-user setting and has the property of strong 
resolution-ambiguity. Finally we provide a new construction of opti¬ 
mistic fair exchange with strong resolution-ambiguity. The new protocol 
is setup-free, stand-alone and multi-user secure without random oracles. 

1 Introduction 

In a fair exchange protocol, two parties can exchange their items in a fair way so 
that no one can gain any advantage in the process. A simple way to realize fair 
exchange is to introduce an online trusted third party who acts as a mediator: 
earth party sends the item to the trusted third party, who upon verifying the 
correctness of both items, forwards each item to the other party. A drawback of 
this approach is that the trusted third party is always involved in the exchange 
even if both parties are honest and no fault occurs. In practice, the trusted 
third party could become a bottleneck of the system and is vulnerable to the 
denial-of-service attack. 

Optimistic Fair Exchange (also known as off-line fair exchange) was intro¬ 
duced by Asokan et al. [T]. An optimistic fair exchange protocol also needs a 
third party called “arbitrator”, who is not required to be online all the time. 
Instead, the arbitrator only gets invoked when something goes wrong (e.g., one 
party attempts to cheat or other faults occur). An optimistic fair exchange proto¬ 
col involves three participants, namely the signer, the verifier and the arbitrator. 
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The signer (say, Alice) first issues a verifiable “partial signature” a' to the verifier 
(say, Bob). Bob verifies the validity of a' and fulfills his obligation if a' is valid. 
After that, Alice sends Bob a “full signature” a to complete the transaction. 
Thus, if no problem occurs, the arbitrator does not participate in the exchange. 
However, if Bob does not receive the full signature a from Alice, Bob can send 
a' (and the proof of fulfilling his obligation) to the arbitrator, who will convert 
a' to a for Bob. 

An optimistic fair exchange protocol can be setup-driven or setup-free [23] , 
An optimistic fair exchange protocol is called setup-driven if an initial-key-setup 
procedure between a signer and the arbitrator is involved. On the other hand, 
an optimistic fair exchange protocol is called setup-free if the signer does not 
need to contact the arbitrator, except that the signer can obtain and verify the 
arbitrator’s public key certificate and vice versa. As shown in HQ!, setup-free is 
more desirable for the realization of optimistic fair exchange in the multi-user 
setting. Another notion of optimistic fair exchange is stand-alone [23], which 
requires that the full signature be an ordinary signature. 


1.1 Previous Work 

As one of the fundamental problems in secure electronic transactions and digi¬ 
tal rights management, fair exchange has been studied intensively since its in¬ 
troduction. It is known that optimistic fair exchange can be constructed (in a 
generic way) using “two signatures” construction [111] , verifiably encrypted signa¬ 
ture l2lBl8l9llHl20llT?l . the sequential two-party multisignature (first introduced 
by Park et al. hu, and then broken and repaired by Dodis and Reyzin HU), 
the OR-proof HH, and conventional signature and ring signature [T4] , In the 
following, we only review some results which are most relevant to this paper. 

Optimistic Fair Exchange in the Single-user Setting 

There are three parties involved in an optimistic fair exchange protocol, which 
are signer(s), verifier(s) and arbitrator(s). Most work about optimistic fair ex¬ 
change was considered only in the single-user setting, namely there is only one 
signer. The first formal security model of optimistic fair exchange was proposed 
in [213] . Dodis and Reyzin HU defined a more generalized and unified model 
for non-interactive optimistic fair exchange, by introducing a new cryptographic 
primitive called verifiably committed signature. In HU , the security of a verifiably 
committed signature scheme (equivalently, an optimistic fair exchange protocol) 
in the single-user setting consists of three aspects: security against the signer, 
security against the verifier and security against the arbitrator. While the arbi¬ 
trator is not fully trusted, it is still assumed to be semi-trusted in the sense that 
the arbitrator will not collude with the signer or the verifier. In the remainder 
of this paper, an optimistic fair exchange protocol is single-user secure (or, se¬ 
cure in the single-user setting) means that it is secure in the single-user setting 
defined in m Notice that their definition does not include all security notions 
of optimistic fair exchange (e.g., abuse-free [T2], non-repudiation HMU, timely- 
termination j2!3j and signer-ambiguity [13]), but it does not affect the point we 
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want to make in this paper. Dodis and Reyzin m proposed a stand-alone but 
setup-driven verifiably committed signature scheme from Gap Diffie-Hellman 
problem. Constructions of stand-alone and setup-free verifiably committed sig¬ 
nature were proposed in [22123] . 

Optimistic Fair Exchange in the Multi-user Setting 

Recently the security of non-interactive optimistic fair exchange in the multi¬ 
user setting was independently studied in [HI] and (24] . Optimistic fair exchange 
in the multi-user setting refers to the scenario where there are two or more 
signers in the system, but items are still exchanged between two parties. This 
is different from the multi-party exchange which considers the exchange among 
three or more parties. 

In [TO] , Dodis, Lee and Yum pointed out that the single-user security of opti¬ 
mistic fair exchange cannot guarantee the multi-user security. They presented a 
simple counterexample which is secure in the single-user setting but is insecure 
in a multi-user setting. (In the counterexample, a dishonest verifier in the multi¬ 
user setting can obtain a full signature without fulfilling the obligation.) Dodis, 
Lee and Yum defined the multi-user security model of optimistic fair exchange 
and provided a generic setup-free construction of optimistic fair exchange se¬ 
cure in the multi-user setting m- The security of their construction relies on 
one-way functions in the random oracle model and trapdoor one-way permu¬ 
tations in the standard model. The analysis in [TO] shows that two well-known 
techniques of optimistic fair exchange (namely, constructions based on verifiably 
encrypted signatures and sequential two-party signatures) remain secure in the 
multi-user setting if the underlying primitives satisfy some security notions. In¬ 
dependently, Zhu, Susilo and Mu [24] also demonstrated a verifiably committed 
signature scheme which is secure in the model defined in m but is insecure in 
the multi-user setting. They defined the security notions of verifiably commit¬ 
ted signature in the multi-user setting and proposed a concrete construction of 
multi-user secure stand-alone and setup-free verifiably committed signature [24] . 
The non-interactive version of their scheme uses the Fiat-Shamir technique and 
requires a hash function, which is viewed as the random oracle in security anal¬ 
ysis. Due to [ID] , multi-user secure stand-alone and setup-free optimistic fair 
exchange protocols without random oracles can be constructed from verifiably 
encrypted signature schemes without random oracles [ISISftllftj . 

Certified-Key Model and Chosen-Key Model 

Most optimistic fair exchange protocols are considered in the certified-key model 
where the user must prove the knowledge of the private key at the key regis¬ 
tration phase. Therefore, the adversary is only allowed to make queries about 
certified public keys. Huang et al. uma considered the multi-user security of 
optimistic fair exchange in the chosen-key model , where the adversary can make 
queries about public keys arbitrarily without requiring to show its knowledge of 
the corresponding private keys. Optimistic fair exchange protocols secure in the 
certified-key model may not be secure in the chosen-key model [14] , 

Huang et al. [14] proposed another generic construction for optimistic fair ex¬ 
change. Their construction can lead to efficient setup-free optimistic fair exchange 
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protocols secure in the standard model and the chosen-key model. Very recently, 
the first efficient ambiguous optimistic fair exchange protocol was proposed in [15] . 
The new protocol is proven secure in the multi-user setting and chosen-key model 
without relying on the random oracle assumption. Without any doubt, it is more 
desirable if cryptographic protocols can be proven secure in the chosen-key model. 
However, in this paper, the security of optimistic fair exchange is considered in the 
certified-key model (as defined in PI) , since certified-key model is reasonable and 
has been widely used in the research of public key cryptography. In the remain¬ 
der of this paper, when we say an optimistic fair exchange protocol is multi-user 
secure (or, secure in the multi-user setting), it refers that the protocol is secure in 
the multi-user setting defined in 1 1Of (which is in the certified-key model). 

1.2 Motivation 

The research on optimistic fair exchange has shown that: 

— The single-user security of optimistic fair exchange does not guarantee the 
multi-user security mm- 

— Not all single-user secure optimistic fair exchange protocols are insecure in 
the multi-user setting |10| . Several single-user secure protocols can be proven 
secure in the multi-user setting [ID] . 

However, it remains unknown under which conditions single-user secure opti¬ 
mistic fair exchange protocols will be secure in the multi-user setting ? We believe 
the investigation of this question not only will provide a further understanding 
on the security of optimistic fair exchange in the multi-user setting, but also can 
introduce new constructions of multi-user secure optimistic fair exchange. 

1.3 Our Contributions 

This paper focuses on both theory investigations and new construction of opti¬ 
mistic fair exchange in the multi-user setting. 

1. In Section [3] we introduce and define a new property of optimistic fair ex¬ 
change, which we call Strong Resolution-Ambiguity. Briefly speaking, an opti¬ 
mistic fair exchange protocol has the property of strong resolution-ambiguity 
if one can transform a partial signature a' into a full signature a using signer’s 
private key or arbitrator’s private key, and given such a pair (ex', a), it is in¬ 
feasible to tell which key is used in the conversion. While there are some 
optimistic fair exchange protocols satisfying strong resolution-ambiguity, it 
is the first time this notion is addressed and formally defined. 

2. For an optimistic fair exchange protocol with strong resolution-ambiguity, 
we prove that its security in the single-user setting is preserved in the multi¬ 
user setting. More precisely, we show that: (1) the security against the signer 
and the security against the verifier in the single-user setting are preserved 
in the multi-user setting for optimistic fair exchange protocols with strong 
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resolution-ambiguity, and (2) the security against the arbitrator in the single- 
user setting is preserved in the multi-user setting (for optimistic fair exchange 
protocols either with or without strong resolution-ambiguity). 

While strong resolution-ambiguity is not a necessary property for (multi¬ 
user secure) optimistic fair exchange protocols, our result provides a new 
approach for the security analysis of optimistic fair exchange protocols in 
the multi-user setting: One only needs to analyze the security in the single- 
user setting (rather than the more complex multi-user setting) for optimistic 
fair exchange protocols with strong resolution-ambiguity. 

3. In Section 2} we provide a new construction of optimistic fair exchange with 
strong resolution-ambiguity. Our construction is a variant of the optimistic 
fair exchange protocol from the verifiably encrypted signature scheme pro¬ 
posed in [T5]. The protocol in [TS] has several desirable properties, e.g., 
setup-free, stand-alone and multi-user secure without random oracles un¬ 
der computational Diffie-Hellman assumption. Our protocol retains all these 
properties and is more efficient in generating, transmitting and verifying 
partial signatures. This however is achieved at the cost of larger key size. 

2 Definitions of Optimistic Fair Exchange in the 
Multi-user Setting 

This section reviews the syntax and security definitions of optimistic fair ex¬ 
change in the multi-user setting [TO] . 

2.1 Syntax of Optimistic Fair Exchange 

A setup-free non-interactive optimistic fair exchange protocol involves three 
parties: the signer, the verifier and the arbitrator. It is defined by the follow¬ 
ing efficient algorithms. An algorithm is called efficient if it is a probabilistic 
polynomial-time Turing machine. 

— Setup TTP . The arbitrator setup algorithm takes as input a parameter Param, 
and gives as output a secret arbitration key ASK and a public partial verifi¬ 
cation key APK. 

— Setup User . The user setup algorithm takes as input Param and (optionally) 
APK, and gives as output a private signing key SK and a public verification 

key PK. 

— Sig and Ver. These are similar to signing and verification algorithms in an 
ordinary digital signature scheme. 

• The signing algorithm Sig, run by a signer t/j, takes as input (to, SK^, 
APK) and gives as output a signature au t on the message to. In 
fair exchange protocols, signatures generated by Sig are called as full 
signatures. 

• The verification algorithm Ver, run by a verifier, takes as input ( m , arj i , 
PK^, APK) and returns valid or invalid. A signature au i is said to be a 
valid full signature of to under PK^ if Ver(m, au t , PK^, APK) = valid. 
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— PSig and PVer. These are partial signing and verification algorithms, where 
PSig together with Res (which will be defined soon) are functionally equiv¬ 
alent to Sig. 

• The partial signing algorithm PSig, run by a signer Ui, takes as input 
(to, SK^, APK) and gives as output a signature a'jj. on to. To distinguish 
from those produced by Sig, signatures generated by PSig are called as 
partial signatures. 

• The partial verification algorithm PVer, run by a verifier, takes as input 
(to, a[j., PKf/i, APK) and returns valid or invalid. A signature a[j is 
said to be a valid partial signature of to under PK u t if PVer(m, a ' lT ., P Kj/ 4 , 

APK) = valid. 

— Res. The resolution algorithm Res takes as input a valid partial signature er^. 
of to under PK jj i and the secret arbitration key ASK, and gives as output a 
signature au i . This algorithm is run by the arbitrator for a party Uj, who 
does not receive the full signature from Ui, but possesses a valid partial 
signature of Ui and a proof that he/she has fulfilled the obligation to Ui. 

Correctness. If each signature is generated according to the protocol specifica¬ 
tion, then it should pass the corresponding verification algorithms. Namely, 

1. Ver( to, Sig(m, SK^, APK), PK u i , APK) = valid. 

2. PVer(TO, PSig(m,SK ;7i ,APK),pk i7i ,APK) = valid. 

3. Ver(m, Res(m, PSig(m,SK^, APK), ASK, PK^/J, PK^, APK) = valid. 

Resolution-Ambiguity |10111I14I16I24| . Any “resolved signature” Res(m, PSig 
(to, SK^, APK), ASK, PKyJ is (at least computationally) indistinguishable from 
the “actual signature” Sig(m, SKy^APK). 

Security of Optimistic Fair Exchange. Intuitively, the fairness of an ex¬ 
change requires that two parties exchange their items in a fair way so that either 
each party obtains the other’s item or neither party does. This requirement con¬ 
sists of the security against signer(s), the security against verifier(s) and the 
security against the arbitrator, which will be defined by the game between the 
adversary and the challenger. During the game, the challenger will maintain three 
initially empty lists: (1) PK-List contains the public keys of created users; (2) 
PartialSign-List contains the partial signing queries made by the adversary; 
and (3) Resolve-List contains the resolution queries made by the adversary. 

The definitions in the following sections are inspired by those in [TO] . with 
modifications which we believe can demonstrate the difference between the 
single-user security and the multi-user security of optimistic fair exchange. 

2.2 Security against Signer(s) 

In an optimistic fair exchange protocol, the signer should not be able to generate 
a valid partial signature which cannot be converted into a valid full signature by 
the arbitrator. This property is defined by the following game. 
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— Setup. The challenger generates the parameter Param and the arbitrator’s 
key pair (APK, ASK) by running Setup TTP . The adversary A is given Param 
and APK. 

— Queries. Proceeding adaptively, A can make following queries. 

Creating-User-Queries. A can create a user Ui by making a creating-user 
query ([/,;, PK^J. In order to convince the challenger to accept PK^ (i.e., 
add PK Ui to the PK-List ), A must prove its knowledge of the legitimate 
private key SK^. This can be realized by requiring the adversary to hand 
over the private key as suggested in |T5] , or generate a proof of knowledge 

of the private kejo 

Resolution-Queries. For a resolution-query ( m , a', PK) satisfying PVer(m, a', 
PK.APK) = valid, the challenger first browses PK-List. If PK ^ PK- 
List, an error symbol “T” will be returned to the adversary. Otherwise, the 
challenger adds (to, PK) to the Resolve-List (if the pair (to, PK) is not there) 
and responds with an output of Res(?n, a', ASK, PK). 

— Output. Eventually, A outputs a triple (to/, o^, PK*) and wins the game if 
PK* £ PK-List , PVer(TO/, <7/, PK*, APK) = valid, and Ver(m/, Res(m/, 0 /, 
ASK, PK*),PK*, APK) = invalid. 

Let Adv OFE .4 be the probability that A wins in the above game, taken over 
the coin tosses made by A and the challenger. An adversary A is said to 
(A Qcu, Qr, e)-break the security against signer(s) if in time t, A makes at most 
qcu Creating-User-Queries, qn Resolution-Queries and Adv 0 FE _4 is at least e. 

Definition 1 (Security against Signer(s)). An optimistic fair exchange pro¬ 
tocol is (t, qcu , qn, e)-secure against signer(s) if no adversary (t,qcu ,Qr,^)~ 
hreaks it. 

By setting qcu = 1, we can define the security against the signer in the single-user 
setting, namely an optimistic fair exchange protocol is (t, qn, e)-secure against the 
signer in the single-user setting if no adversary (t, 1, qn, e)-breaks it. 

2.3 Security against Verifier(s) 

Briefly speaking, the security against verifier(s) requires that the verifier should 
not be able to generate a valid partial signature of a new message or generate a 
valid full signature without the assistance from the signer or the arbitrator. 

The first requirement is ensured by the security against the arbitrator, namely 
even the arbitrator (knowing more than the verifier) cannot succeed in that 
attack. This will be defined shortly in Section 12.41 The second requirement is 
defined as below. 

— Setup. The challenger generates the parameter Param and the arbitrator’s 
key pair (APK, ASK) by running Setup TTP . The challenger also generates a 
key pair (PK*,SK*) by running Setup User , and adds PK* to PK-List. The 
adversary B is given Param, APK and PK*. 


1 We will use the latter approach in the proof. 
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— Queries. Proceeding adaptively, B can make all queries defined in 
Section [2121 and Partial-Signing-Queries defined as follows. 

Partial-Signing-Queries. For a partial-signing query (m, PK*), the challenger 
responds with an output of PSig(?7i, SK*, APK). After that, (m, PK*) is added 
to the PartialSign-List. (B is allowed to make Partial-Signing-Queries only 
about PK* as other public keys are created by B.) 

— Output. Eventually, B outputs a pair (?n/,<7/) and wins the game if (to/, 
PK*) ^ Re solve-List and Ver(TO/,<r/, PK*, APK) = valid. 

Let Adv OFEg be the probability that B wins in the above game, taken over the 
coin tosses made by B and the challenger. An adversary B is said to {t, qcu , qps , 
qp, e)-break the security against verifier(s) if in time t, B makes at most qcu 
Creating-User-Queries, qps Partial-Signing-Queries, qp Resolution-Queries and 
Adv OF Eg is at least e. 

Definition 2 (Security against Verifier(s)). An optimistic fair exchange 
protocol is ( t,qcu,qps,qRit)-secure against verifier(s) if no adversary ( t,qcu > 
qps,qR,e)-breaks it. 

Similarly, we can obtain the definition of the security against the verifier 
in the single-user setting, namely an optimistic fair exchange protocol is 
(t,qps,qR,e)- secure against the verifier in the single-user setting if no adver¬ 
sary ( t, 0 , qps , qp, e)-breaks it. 

2.4 Security against the Arbitrator 

In this section, we will define the security against the arbitrator and prove that 
the security against the arbitrator in the single-user setting is preserved in the 
multi-user setting. 

The security against the arbitrator requires that the arbitrator, without the 
partial signature on a message to, should not be able to produce a valid full 
signature on r/ 0 - This notion is defined as follows. 

— Setup. The challenger generates the parameter Param, which is given to the 
adversary C. 

— Output-I. C generates the arbitrator’s public key APK and sends it to the 
challenger. (C is required to prove the knowledge of the legitimate private 
key ASK.) In response, the challenger generates a key pair (PK*,SK*) by 
running Setup User and adds PK* to PK-List. The adversary C is given PK*. 

— Queries. Proceeding adaptively, C can make Creating-User-Queries (defined 
in Section m and Partial-Signing-Queries (defined in Section I2~3l) . 

— Output-II. Eventually, C outputs a pair (to/,<t/) and wins the game if 
(to/, PK*) ^ PartialSign-List and Ver (to/, ct/, PK*, APK) = valid. 

2 As almost all previous work about optimistic fair exchange, we assume that signer- 
arbitrator collusion or verifier-arbitrator collusion will not occur. Please refer to mu 
for discussions of those attacks. 
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Let Adv OFEe be the probability that C wins in the above game, taken over 
the coin tosses made by C and the challenger. An adversary C is said to 
(t, qcui QPS, e)-break the security against the arbitrator if in time t, C makes at 
most qcu Creating-User-Queries, qps Partial-Signing-Queries and Adv OFEc is 
at least e. 

Remark 1. In the game, the adversary must first generate the arbitrator’s public 
key APK before obtaining PK* or making other queries. This reflects the defini¬ 
tion of optimistic fair exchange as APK could be an input of algorithms Setup User 
and PSig. For concrete protocols where these algorithms do not require APK as 
the input, the adversary can obtain PK* and/or make partial-signing-queries of 
PK* before generating APK. 

Definition 3 (Security against the Arbitrator). An optimistic fair ex¬ 
change protocol is ( t,qcu,qps > e)-secure against the arbitrator in the multi-user 
setting if no adversary (t, qcu , <7psa )-breaks it. 

We can obtain the definition of the security against the arbitrator in the single- 
user setting, namely an optimistic fair exchange protocol is (t,qps,e)~ secure 
against the arbitrator in the single-user setting if no adversary (t,,0, qps, e)- 
breaks it. The following theorem shows that the security against the arbitrator 
in the single-user setting is preserved in the multi-user setting. 

Theorem 1. An optimistic fair exchange protocol is ( t,qcu,qps,C)-secure 
against the arbitrator in the multi-user setting if it is (t + tiqcu, qps , e)-secure 
against the arbitrator in the single-user setting. Here, ti denotes the time unit 
to respond to one creating-user query. 

Proof. We denote by Cs the adversary in the single-user setting and Cm hr the 
multi-user setting. We will show how to convert a successful Cm to a successful 
Cs- At the beginning, Cs obtains Param from its challenger in the single-user 
setting. 

— Setup. Param is given to Cm- 

— Output-I. Let APK be the arbitrator’s public key created by Cm hr the 
multi-user setting. APK will be sent to Cs’s challenger in the single-user 
setting. Cs will make use of Cm to generate a proof of knowledge, namely Cs 
will act as a relay in the proof by forwarding all messages from its challenger 
to Cm (or, from Cm to its challenger). At the end of this phase, Cs will be 
given a public key PK*, which will be forwarded to Cm as its challenging 
public key in the multi-user setting. 

— Queries. We show how Cs can correctly answer Cm’s queries. 

Creating-User-Queries. For a creating-user query (t/,;, PKpJ, Cs will add 
PK^ to PK-List if Cm can generate a proof of knowledge of the legitimate 
private key. 

Partial-Signing-Queries. For a partial-signing query (in, PK*), Cs forwards 
it to its own challenger and sends the response to Cm- 
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— Output-II. Eventually, Cm will output a pair (m/, it/). Cs will set (m/, <jf) 
as its own output in the single-user setting. 

Cs will win the game in the single-user setting if Cm wins the game in the 
multi-user setting. It follows that the success probability of Cs will be e if Cm 
can (t, qcu, Qps, e)-break the security against the arbitrator in the multi-user 
setting. 

It remains to show the time consumption in the proof. Cs’s running time is 
the same as Cm’s running time plus the time it takes to answer creating-user- 
queries, which we assume each query takes time at most t\. Therefore, the total 
time consumption is t + t\qcu- 

We have shown that for an optimistic fair exchange protocol, if there is an 
adversary (t, qcu, Qps, e)-breaks the security against the arbitrator in the multi¬ 
user setting, then there is an adversary (t + tiqcu, qps, e)-breaks the security 
against the arbitrator in the single-user setting. This completes the proof of 
Theorem (T) □ 

Section [3] will investigate the conditions under which the security against the 
signer and the security against the verifier in the single-user setting will remain 
in the multi-user setting. 

3 Strong Resolution-Ambiguity 

This section investigates a new property of optimistic fair exchange, which 
we call “Strong Resolution-Ambiguity”. We will give the definition of strong 
resolution-ambiguity and prove that for optimistic fair exchange protocols with 
that property, the security against the signer and the security against the verifier 
in the single-user setting are preserved in the multi-user setting. Before giving 
the formal definition, we first review a generic construction of optimistic fair 
exchange Illl- 

Optimistic Fair Exchange from Sequential Two-Party Multisignature 

A multisignature scheme allows any subgroup of users to jointly sign a document 
such that a verifier is convinced that each user of the subgroup participated in 
the signing. To construct an optimistic fair exchange protocol, one can use a 
simple type of multisignature, which is called sequential two-party multisigna¬ 
ture. In this construction, the signer first generates two key pairs ( pk , sk ) and 
(APK, ASK), where (pk, APK, ASK) are sent to the arbitrator through a secured 
channel. The signer’s private key SK is the pair (sfc,ASK) and the arbitrator’s 
private key is ASK. The partial signature a' of a message m is an ordinary signa¬ 
ture generated using sk, and the full signature a is the multisignature generated 
using <j' and ASK. Given a valid partial signature, both the arbitrator and the 
signer can convert it to a full signature using ASK. (Recall that ASK is the ar¬ 
bitrator’s private key and part of the signer’s private key.) It is thus virtually 
infeasible to tell who (the signer or the arbitrator) converted the partial sig¬ 
nature to the full signature. This is the essential requirement of optimistic fair 
exchange with strong resolution-ambiguity, which is formally defined as follows. 
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3.1 Definition of Strong Resolution-Ambiguity 

We first introduce a probabilistic polynomial-time algorithm Convert which al¬ 
lows the signer to convert a partial signature to a full one. The definition of 
Convert is given as below. 

— Convert. This algorithm takes as input the signer’s private key SK^, (option¬ 
ally) arbitrator’s public key APK, a message m and its valid partial signature 
a'. The output is the signer’s full signature a on m. 

In a trivial case, each optimistic fair exchange protocol has an algorithm 
Convert = Sig. (In this case the full signature generated by Convert could be 
totally independent of the partial signature.) Our interest here is to investigate 
non-trivial Convert and compare it with the resolution algorithm Res. Recall 
that, with the knowledge of ASK, one can also convert a partial signature to a 
full one using Res. This makes the following question interesting: Given a valid 
partial signature a', what are the differences between full signatures produced by 
Convert and those produced by Res? The answer to this question inspires the 
definition of strong resolution-ambiguity. 

To formally define the strong resolution-ambiguity, we assume the arbitrator’s 
key pair satisfies an NP-relation Rttp, and users’ key pairs satisfy another NP- 
relation I?u- An NP-relation R is a subset of {0,1}* x {0,1}* for which there 
exists a polynomial / such that \y\ < /(|cc|) for all {x,y) £ R, and there exists a 
polynomial-time algorithm for deciding membership in R. 

In an optimistic fair exchange protocol defined in Section O let (APK, ASK) 
be any pair in Rttp, and let (PK^, SK^) be any pair in R\j. For any pair (m, a') 
satisfying PVer(m, cd, PK[/ i; APK) = valid, we define 

® Convert 1 probability distribution of full signatures produced by Convert(m, cd, 
SK^, APK). 

®Res’ CT ’ '■ probability distribution of full signatures produced by Res(m, cd, PK^, 

ASK). 

Definition 4 (Strong Resolution-Ambiguity). An optimistic fair exchange 
protocol is said to satisfy strong resolution-ambiguity if there exists an algorithm 
Convert as defined above such that ©Convert identical to Dr”s ^ ' ■ 

Strong Resolution-Ambiguity and Resolution-Ambiguity: A Brief 
Comparison 

An optimistic fair exchange protocol with strong resolution-ambiguity will sat¬ 
isfy resolution-ambiguity if Sig is defined as (PSig + Convert), namely the signer 
first generates a partial signature and then converts it to a full one using Convert. 
In this case, actual signatures (generated by Sig) are indistinguishable from re¬ 
solved signatures (generated by Res). However, resolution-ambiguity cannot en¬ 
sure strong resolution-ambiguity which requires that one can use the signer’s 
private key to convert a partial signature to a full one and the conversion is 
indistinguishable from that using the arbitrator’s private key. 
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3.2 Optimistic Fair Exchange Protocols with/without Strong 
Resolution-Ambiguity 

It is evident that the generic construction of optimistic fair exchange from se¬ 
quential two-party multisignature [TTj (reviewed at the beginning of Section [31) 
has the strong resolution-ambiguity property by defining Convert = Res. Be¬ 
low are some other concrete examples of optimistic fair exchange with/without 
strong resolution-ambiguity. 

Optimistic Fair Exchange from Verifiably Encrypted Signatures 
Let OFE-VES be optimistic fair exchange protocols constructed from verifiably 
encrypted signatures. If the algorithm Sig is deterministic (e.g., the verifiably 
encrypted signature scheme in [8j), then OFE-VES will have the strong resolution- 
ambiguity property. For any valid partial signature of m, there is only one out¬ 
put of the algorithm Res, namely the unique full signature of m. By defining 
Convert = Sig, Bconvert and Bp’/' 7 ^ will be identical and the protocols satisfy 
strong resolution-ambiguity. OFE-VES with probabilistic Sig algorithms could 
also have the strong resolution-ambiguity property. One example is the opti¬ 
mistic fair exchange protocol from the verifiably encrypted signature scheme 
proposed in [15] . In [TSj, the Sig algorithm is the signing algorithm in Waters 
signature pTSj, and the partial signature a' is the encryption of the full signa¬ 
ture a using APK. After extracting er from er', the arbitrator will randomize er 
such that the output of Res is a full signature uniformly distributed in the full 
signature space. This makes the distribution of full signatures produced by Res 
the same as that of full signatures generated by Convert = Sig. 

A Concrete Instance of the Generic Construction in m 

The generic construction of optimistic fair exchange in m is based on a con¬ 
ventional signature scheme and a ring signature scheme, both of which can be 
constructed efficiently without random oracles. In the protocol, the signer and 
the arbitrator first generate their own key pairs. The full signature of a message 
m is a pair (si, S 2 ), where si is the signer’s conventional signature on the message 
m, and S 2 is a ring-signature on m and si. Either the signer or the arbitrator is 
able to generate S 2 ■ This construction will satisfy strong resolution-ambiguity if 
the distribution of ring signatures generated by the signer is the same as that of 
ring signatures generated by the arbitrator (e.g., 2-User ring signature scheme 
without random oracles [5]). 

A Concrete Protocol without Strong Resolution-Ambiguity 

One example of optimistic fair exchange protocols without strong resolution- 
ambiguity is the single-user secure but multi-user insecure optimistic fair 
exchange protocol proposed in [TD]. In this protocol, the full signature of a mes¬ 
sage to is <r = (r, 5), where 5 is the signer’s conventional signature on “mUy”, 
U = f( r ) j an( l / is a trapdoor one-way permutation. The partial signature is 
defined as o' = (y, <5). To convert (y,S) to a full signature, the arbitrator uses 
his/her private key / -1 to compute r = / _1 (j/) and obtain the full signature 
(r,S). Given a message to and its full signature (r, S), it is hard to tell if (r,S) 
is produced by Sig directly, or first generated by PSig and then by Res. Thus, 


136 X. Huang et al. 


as shown in [III, the property “resolution-ambiguity” is satisfied. On the other 
hand, this protocol does not have strong resolution-ambiguity as / is a trap¬ 
door one-way permutation. Suppose, otherwise, there is an algorithm Convert 
such that for a partial signature a' , the outputs of Convert(m, a', SK^, /) have 
the same probability distribution as those of Res(m, cr', PK^,/ _1 ). Note that 
for a' = ( y,6 ), Res will output a pair (r,S) such that y = f(r). It follows that 
Convert(m, a', SKp 4 , /) must also output (r, S) satisfying y = f(r) if the protocol 
has strong resolution-ambiguity. This breaks the one-wayness of /, namely given 
y, there is an efficient algorithm Convert which can find r such that /(r) = y 
without the trapdoor / -1 . 

Notice that given a partial signature a', the signer can generate a full signature 
a such that a is indistinguishable from the one converted by the arbitrator. To do 
that, the signer needs to maintain a list {(r, y) : y = f(r)} when he/she produces 
the partial signature a' = ( y , (5). Later on, for a partial signature ( y , <5), the signer 
can search the list and find the matching pair ( r,y ). In this case, the signer can 
generate a full signature (r, <5) which is indistinguishable from the one converted 
by the resolution algorithm Res. However, this approach does not satisfy the 
definition of Convert since it requires an additional input r. (Recall that the 
inputs of Convert are only SK^, (to, a') and APK.) 

3.3 Security of Optimistic Fair Exchange Protocols with Strong 
Resolution-Ambiguity 

Theorem [T] has shown that the security against the arbitrator in the single-user 
setting is preserved in the multi-user setting. This section considers the other 
two security notions, and we will prove that: 

1. For optimistic fair exchange protocols with strong resolution-ambiguity, the 
security against the signer in the single-user setting remains in the multi-user 
setting (Theorem 12) . 

2. For optimistic fair exchange protocols with strong resolution-ambiguity, the 
security against the verifier in the single-user setting remains in the multi¬ 
user setting (Theorem 12 • 

Theorem 2. An optimistic fair exchange protocol with strong resolution am¬ 
biguity is (t, qcu, Q_Ri e)-secure against signers in the multi-user setting, if it is 
(t + tiqcu + t 2 qR,qn,e/qcu)-secure against the singer in the single-user setting. 
Here, t\ is the time unit depends on the validity of the proof of knowledge and 
t ‘2 is the time unit depends on the algorithm Convert in the protocol. 

Proof. We denote by As the adversary in the single-user setting and Am in 
the multi-user setting. In the proof, we use the standard method by showing 
that for an optimistic fair exchange protocol with strong resolution-ambiguity, a 
successful Am can be converted into a successful As- We first give a high-level 
description of the proof. 

As will act as the challenger of Am in the proof and answer all queries from 
the latter. As will set the challenging public key PK* of Am as its own challeng¬ 
ing public key, and set Am 's output as its own output. The most difficult part in 
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the proof is how .As can correctly answer resolution queries from Am- For reso¬ 
lution queries related to PK*, As can use its own challenger to generate correct 
responses. However, this is not feasible for resolution queries about other public 
keys (since As’s challenger only responds to queries about PK*). Fortunately, 
such queries can be correctly answered by As if the optimistic fair exchange 
protocol has strong resolution-ambiguity. For a resolution query (to, cd, PKj/J, 
As can convert a' to a full signature a using the algorithm Convert and the 
private key SK^. Due to Def. |4] this perfectly simulates the real game between 
Am and the challenger in the multi-user setting. The private key SK^ can be 
extracted by As due to the validity of the proof of knowledge required in the 
creating-user phase. 

The details of the proof appear in the full version of this paper. □ 

Theorem 3. An optimistic fair exchange protocol with strong resolution ambi¬ 
guity is (t, qcu, QPS: Qr, e)-secure against verifiers in the multi-user setting, if it 
is (t + tiqcu + t 2 qn, qps, < 1 r, e)~secure against the verifier in the single-user set¬ 
ting. Here, t\ is the time unit depends on the validity of the proof of knowledge 
and t -2 is the time unit depends on the algorithm Convert in the protocol. 

Proof. The details of the proof appear in the full version of this paper. 

Remark 2. Our analysis only shows that strong resolution-ambiguity is a suffi¬ 
cient condition for single-user secure optimistic fair exchange protocols remaining 
secure in the multi-user setting. It is not a necessary property for (multi-user 
secure) optimistic fair exchange protocols. 

4 A New Optimistic Fair Exchange Protocol with Strong 
Resolution-Ambiguity 

A new optimistic fair exchange protocol with strong resolution-ambiguity is pro¬ 
posed in this section. The protocol is based on Waters signature |[T5] from bilinear 
mappings. Definitions of bilinear mappings and computational Diffie-Hellman 
assumption can be found in [HI]- 

4.1 The Proposed Protocol 

Let (G,Gt) be bilinear groups of prime order p and let g be a generator of G. e 
denotes the bilinear mapping G x G —> G t- Let n be the bit-string length of the 
message to be signed. For an element m in {0, l} n , let A4 C {1,2, • • • , n} be the 
set of all i for which the i th bit to,; is 1. The parameter Param is (G, G t,P, e, n). 

— Setup TTP . Given Param, the arbitrator chooses a random number w £ Z p 
and calculates W = g w . The arbitrator’s public key APK is W, and the 
private key ASK is w. 

— Setup User . Given Param, this algorithm outputs a private signing key SK^ = 
(xUi,yUi) and a public verification key PK jj 4 = {Xu i ,Yu i ,vu i ), where 
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1. Xjji and yu i are randomly chosen in Z p ; 

2. Xui = e(g,g) xu * and Yu t = g VUi \ and 

3. v\j i is a vector consisting of n + 1 elements Vo, V\. V 2 , ■ ■ ■ , V n . All these 
elements are randomly selected in G. 

— Sig. Given a message m, the signer U t uses the private key xjj t to generate 
a Waters signature a = (cri, cr 2 ), where oy = g xu * • (Vo Y\ieM CT2 = 
and r is a random number in Z p . 

— Ver. Given a message-signature pair (m,a) and Ufs public key PK u t = 
{Xu i iYjj i ,vu i \ this algorithm outputs valid if e(ay,g) = Xjj i ■ 
e ( V o Y\ i( z M Vi , (J 2 ). Otherwise, this algorithm outputs invalid. 

— PSig. Given a message m and the arbitrator’s public key W, the signer Ui 
first runs Sig to obtain a full signature ( 01 , 02 )- After that, Ui calculates 
a[ = a 1 • W vu * and a' 2 = ct 2 - The partial signature a' is (a[, a' 2 ). 

— PVer. Given a pair Ui s public key PKjy. and arbitrator’s public key 

APK (which is W), one parses a' as (cr(, cr^). This algorithm outputs valid if 
e(a[,g) = X\j i ■e[Yjj i , W)-e(Vo Vi,a' 2 ). Otherwise, it outputs invalid. 

— Res. Given a valid partial signature a 1 of the message m under a public 

key PKf/, = (Xtj ., y r7 ., U[/J, the arbitrator first parses a' as ( 01 , 02 )- After 
that, the arbitrator uses the private key w to calculate ay = ■ (Yjj i )~ w 

and g 2 = a' 2 . The arbitrator then chooses a random number r' £ Z p and 
calculates of = oy • (Vo TlieAi ^) r an< ^ a 2 = ■ g r ■ The output of the 

algorithm Res is (erf 2 , erf). 

Analysis of Our Protocol. It is evident that our protocol is setup-free and 
stand-alone. We show that it also satisfies strong resolution-ambiguity. 

One can find an algorithm Convert, which is the same as Sig, such that given 
any partial signature a', the outputs of Convert are indistinguishable from those 
produced by Res, both of which are uniformly distributed in the valid signa¬ 
ture space of Waters signature. Thus, the proposed protocol also satisfies strong 
resolution-ambiguity. 

The following theorem shows that the protocol is secure in the multi-user 
setting. 

Theorem 4. The proposed protocol is multi-user secure under computational 
Diffie-Heilman assumption. 

Proof. The details of the proof appear in the full version of this paper. □ 

4.2 Comparison to Previous Protocols 

Table, m compares the known optimistic fair exchange protocols which have the 
same properties as the newly proposed one (namely, non-interactive, setup-free, 
stand-alone and multi-user secure without random oracles). The comparison is 
made from the following aspects: (1) underlying complexity assumption, (2) 
partial signature size and full signature size, and (3) the computational cost 
of signing and verifying partial signatures and full signatures. We consider the 
cost of signing and verifying partial signatures since the signer must generate 
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Table 1 . Multi-user Secure Stand-Alone and Setup-Free Optimistic Fair Exchange 
Protocols without Random Oracles 



Our Protocol 

US] 

m 

m 

Complexity Assumption 

CDH 

CDH 

CT-CDH 

SDH 

Full Signature 

Waters |19J 

Waters [19] 

Waters [19j 

BB [7] 

Signature Size rilg 

2|G| 

3|G| 

2|G| 

2|G| + |Z P | 

Signing Cost^ blg 

Cw+ lExpc 

Cw+ 2Expc 

Cw 

Cbb+2I?:epg 

Verification Cost rVer 

2BM+1BM 

3 BM 

2RA/+1BM 

2BM+4BM 


Notations. 

CDH: Computational Diflie-Hellman assumption. 

CT-CDH: Chosen-target computational Diflie-Hellman assumption j6|. 
SDH: Strong Diflie-Hellman assumption. 

|G|: bit length of an element in G, |Z P |: bit length of an element in Z p . 
Cw: Computational cost of generating one Waters signature }19j . 

Cbb: Computational cost of generating one BB signature [7j- 

Expq: Exponentiation in G, Expc: Pre-computable exponentiation in G. 

BM: Bilinear mapping, BM: Pre-computable bilinear mapping. 


a partial signature in each exchange, which will be verified by the verifier and 
could also be checked again by the arbitrator. Therefore, the efficiency of signing 
and verifying partial signatures is at least as important as that of full signatures. 
In Table. [U the most efficient one is the protocol constructed from the veriff- 
ably encrypted signature scheme in [lSj . whose security assumption is strong 
Difffe-Hellman assumption (SDH). The other three protocols are all based on 
Waters signature, but the security of the protocol in [3D] can only be reduced to 
a stronger assumption: chosen-target computational Difffe-Hellman assumption 
(CT-CDH). Our protocol and the one proposed in [15] are designed in a similar 
manner. When compared with ITS], our protocol has a shorter partial signa¬ 
ture size and is more efficient in signing and verifying partial signatures. This is 
achieved at the cost of larger key size (one more pair (yjji, Yu,) in xG). 

5 Conclusion 

This paper shows several new results about optimistic fair exchange in the multi¬ 
user setting. We formally defined the Strong Resolution-Ambiguity in optimistic 
fair exchange and demonstrated several concrete optimistic fair exchange proto¬ 
cols with that property. In the certified-key model, we prove that for optimistic 
fair exchange protocols with strong resolution-ambiguity, the security in the 
single-user setting can guarantee the security in the multi-user setting. In addi¬ 
tion to theoretical investigations, a new construction of optimistic fair exchange 
with strong resolution-ambiguity was proposed. The new protocol is setup- 
free, stand-alone, and provably secure in the multi-user setting without random 
oracles. 
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Abstract. Network coding offers the potential to increase throughput 
and improve robustness without any centralized control. Unfortunately, 
network coding is highly susceptible to “pollution attacks” in which mali¬ 
cious nodes modify packets improperly so as to prevent message recovery 
at the recipient(s); such attacks cannot be prevented using standard end- 
to-end cryptographic authentication because network coding mandates 
that intermediate nodes modify data packets in transit. 

Specialized “network coding signatures” addressing this problem have 
been developed in recent years using homomorphic hashing and homo¬ 
morphic signatures. We contribute to this area in several ways: 

— We show the first homomorphic signature scheme based on the RSA 
assumption (in the random oracle model). 

— We give a homomorphic hashing scheme that is more efficient than 
existing schemes, and which leads to network coding signatures based 
on the hardness of factoring (in the standard model). 

— We describe variants of existing schemes that reduce the communi¬ 
cation overhead for moderate-size networks, and improve computa¬ 
tional efficiency (in some cases quite dramatically - e.g., we achieve 
a 20-fold speedup in signature generation at intermediate nodes). 

Underlying our techniques is a modified approach to random linear net¬ 
work coding where instead of working in a vector space over a field, we 
work in a module over the integers (with small coefficients). 


1 Introduction 

Network coding f2H8j offers an alternative, decentralized approach to traditional 
multicast routing. We consider a network setting where a source node has a file 
that it wants to distribute to a set of target nodes. The source partitions the file 
into to packets which it transmits to its neighboring nodes. Further transmission 
happens through intermediate nodes who receive packets via incoming links and 
produce modified packets sent over outgoing links. These outgoing packets are 

* This work was supported by the US Army Research Laboratory and the UK Ministry 
of Defence under agreement number W911NF-06-3-0001. 

** Work done while visiting IBM, and supported also by NSF CAREER 
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computed as linear combinations of incoming packets, where packets are viewed 
as vectors in a vector space over some field. (See further discussion in Seetiori EZD ) 
We focus on the case of random linear network coding DM. where scalars are 
chosen by each intermediate node at random from the underlying field. This 
strategy induces a fully decentralized solution to the routing problem since nodes 
do not need to coordinate their actions. 

Target nodes reconstruct the original file sent by the source using the packets 
they receive. This can be done if the intermediate nodes augment each vector they 
send with m additional coding coordinates that encode the linear combination 
that resulted in that vector. A target that receives a set of augmented vectors 
for which the coding coordinates induce a full rank matrix can recover the file 
sent by the source via simple matrix inversion. (See Section [2.Il l A fundamental 
question is: what is the decoding probability at the targets; i.e., what is the 
probability with which a target is able to reconstruct the original file? The 
network coding literature shows that small-size fields (e.g., F 2 s) provide good 
decoding probability for sufficiently connected networks. 

Although network coding can increase throughput and reliability relative to 
alternative techniques, it is susceptible to pollution attacks in which malicious 
nodes inject invalid packets that prevent reconstruction of the file at the targets. 
(An invalid packet is any packet that is not in the linear span of the original 
augmented vectors sent by the source.) Due to the way vectors are propagated 
and combined in the network, a single invalid packet injected by an attacker can 
invalidate many more packets further downstream. This constitutes a serious 
denial of service attack which can be mounted effortlessly. 

Two naive solutions to this problem are easily seen to be inapplicable. Having 
the source sign the file prevents a target node from reconstructing an incorrect 
file, but does not enable the target to efficiently reconstruct the correct file in 
the first place. (Moreover, it does not provide any way for intermediate nodes to 
drop invalid packets they receive.) Having the source sign each augmented vector 
it sends (using a standard signature scheme) is also of no help, since interme¬ 
diate nodes are supposed to modify vectors in transit. Prior work has shown, 
however, that dedicated network coding signatures can be used to address pollu¬ 
tion attacks. Such signatures have been based on two primitives: homomorphic 
hash functions mm or homomorphic signatures mm- In both cases, homo¬ 
morphic properties ensure that the signature (or hashing) operation on a linear 
combination of vectors results in a corresponding homomorphic combination of 
signatures (or hash values). See Section I2T21 for further details. 

Constructions of homomorphic hash functions are well known, and can be 
implemented over any prime-order group where the discrete logarithm problem 
is hard. Building homomorphic signatures is more challenging. So far the only 
known construction is based on bilinear groups [6] and involves costly pairing 
operations. In particular, network coding signatures based on homomorphic sig¬ 
natures are computationally more expensive than those built from homomorphic 
hashing. However, the latter are less communication-efficient since they require 
each packet transmitted to be sent along with some “authentication data” whose 
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length is proportional to m (the number of file vectors). One drawback of both 
approaches is that they replace the small fields used in “standard” network cod¬ 
ing with very large fields appropriate for cryptography. For example, instead of 
using vectors over an 8-bit field as in traditional network coding, the crypto¬ 
graphic approaches use vectors over a 160-bit field instead. This increases both 
the communication and computational overhead. 

Our Contributions. We present new and improved network coding signatures. 
First, we show the first homomorphic signature scheme based on the RSA as¬ 
sumption in the random oracle modelo In particular, it offers more efficient 
processing at the intermediate nodes as compared to the scheme of [B] that is 
based on bilinear groups and pairings. The bandwidth overhead is also lower for 
networks of moderate size (e.g., where the maximum path length between source 
and target nodes is 20-30 hops). 

We also present a new homomorphic hashing scheme which is quite efficient. 
Treating each information vector v as a single (large) integer, we define our hash 
function simply as Hn(v) = 2 V mod N for a composite N. This hash function 
is homomorphic over the integers and can be proven collision resistant based on 
the hardness of factoring. This constructions leads to a network coding signature 
scheme based on the factoring assumption and without random oracles. 

A core technique we use for both the above constructions is to apply network 
coding in a module over the integers rather than in a vector space over a field, as 
is traditionally done. By working over the integers we enable the homomorphic 
properties of the above two schemes (where the group order is unknown), and 
furthermore can work with small coefficients (that need not be cryptographically 
large). This has the immediate effect of improving the computation at interme¬ 
diate nodes, and it also reduces the total bandwidth overhead for networks with 
moderate-length paths between source and targets. 

We must analyze how this change from working over a field to working over the 
integers affects the decoding probability. We show that if the integer coefficients 
are taken from a set Q = {0,...,g—1} for prime q , then the decoding probability 
is at least as good as working over the field F g ; thus we conclude that using 8-bit 
coefficients is good enough for most applications. 

The ability to perform with network coding with small integer coefficients 
allows us also to improve the performance of existing schemes. We show that by 
choosing coefficients from a small set Q as above (but still performing computa¬ 
tions modulo the large prime p as required by prior schemes) we can significantly 
improve performance: e.g., we obtain roughly a 20-fold improvement in signa¬ 
ture generation time at intermediate nodes and a reduction in the communication 
overhead as well. 


1 Yu et al. [203 recently proposed an RSA-based homomorphic signature scheme, but 
their scheme is essentially flawed (e.g., no signature, even one produced by an honest 
source, ever passes verification). The problem is that Yu et al. incorrectly assume 
(cf. equations (11) and (12) in Section III-B of [20]) that for integers A, b , d, a prime 
e, and RSA composite N , it holds that ( A b mod e) d mod N = (A mod e) bd mod N. 
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Organization. Section [2] reviews network coding and existing network coding 
signature schemes. In Section 02 we discuss network coding over the integers and 
show how this translates into performance improvements for existing network 
coding signature schemes. We present our RSA-based homomorphic signature 
scheme in Section [U and our factoring-based homomorphic hashing scheme in 
Section [5j 

2 Background 

2.1 Network Coding 

We present a high-level description of linear network coding (the only type with 
which we are concerned in this work); for further details see [T^ . In this setting, 
we have a network with a distinguished node S, called the source , and a subset 
of nodes known as targets. The objective is for S to transmit a file F to all the 
target nodes, where F is represented as a matrix containing the m (row) vectors 
..., b m ) £ F” over some finite field F. 

The source first creates m augmented vectors w f 1 ),..., w^ rn ' ) defined as 

w {i) = (0,... ,0,1,0,... ,0 || v {i) ) £ F m+n ; 



i.e., each original vector v 1 - 1 ' 1 of the file is pre-pended with the vector of length 
m containing a single ‘1’ in the itli position. These augmented vectors are sent 
by the source to its neighboring nodes. 

Each (well-behaved) intermediate node / in the network processes packets 
(i.e., incoming vectors) as follows. Upon receiving packets u/ 1 ),..., uM 1 £ F™ + " 
on its £ incoming communication edges, I computes a packet w for each of its 
outgoing links as a linear combination of the packets that it received. That is, 
each outgoing packet w transmitted by I takes the form w = 1 ctiW W, where 

at £ F. We say a vector w transmitted in the network (in the scenario above) is 
valid if it lies in the linear span of the original augmented vectors tb 1 ),..., u>( m \ 
It is easy to see that if all nodes follow the protocol honestly, then every packet 
transmitted in the network is valid. 

Different strategies for choosing the coefficients a,; yield different variants of 
network coding. When the {ct;} are chosen randomly and independently by each 
intermediate node, for each of its outgoing communication links, the resulting 
scheme is referred to as random linear network coding mm . When analyzing 
efficiency, we assume random linear network coding is used; our constructions, 
however, ensure security regardless of how the coefficients are chosen. 

To recover the original file, a target node must receive m (valid) vectors 
{ujW = ( u W|| u W)}m i f or w ] 1 i c ] 1 yf 1 ),..., t(( m ) are linearly independent. If we 
define a matrix U whose rows are the vectors u ^\..., nf m ) and a matrix V 
whose rows are the vectors t/ 1 ),..., v^ m \ the original file can be recovered as 


F=U~ l V. 


(1) 
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Assuming the coefficients are chosen randomly and independently by the in¬ 
termediate nodes, the decoding probability — i.e., the probability with which a 
given target node will be able to recover the file (or, equivalently, the probability 
with which a given target node will receive m linearly independent vectors, in 
the sense required above) — is determined by the network topology and the 
size of the field F. To minimize the communication overhead (due to the first 
m coordinates of every transmitted vector), it is desirable to keep |F| as small 
as possible; on the other hand, choosing |F| too small would reduce the decod¬ 
ing probability too much. For typical networks encountered in practice, taking 
|F| « 256 has been shown to give a decoding probability of better than 99%. 

2.2 Network Coding Signatures 

We have already discussed the problem of pollution attacks, and why standard 
cryptographic mechanisms are incapable of preventing them. Early efforts to 
deal with pollution attacks focused on information-theoretic solutions mm 
that use error-correction techniques to ensure that targets can reconstruct the 
file as long as the ratio of valid to invalid vectors they receive is sufficiently high. 
Unfortunately, these techniques (inherently) impose limitations on the number 
of nodes the adversary can corrupt, the number of packets that can be modified, 
and/or the number of links on which the adversary can eavesdrop. Researchers 
have more recently turned to cryptographic approaches that place no restrictions 
on the adversary (other than assuming that the adversary is computationally 
bounded) |17j7|21l6] . These approaches give network coding signature schemes 
that allow anyone holding the public kejo of the source to determine whether 
a given vector is valid. This allows target nodes to reject invalid vectors before 
reconstructing the file; it also allows intermediate nodes to filter out invalid vec¬ 
tors when generating their outgoing messages. For formal definitions of network 
coding signatures and their security requirements, see [6]. 

Two classes of network coding signature schemes are known: those based on 
homomorphic hashing , and those using homomorphic signatures. We describe 
these now at a high level. 

Schemes based on homomorphic hashing |17U2ltf6] . A homomorphic hash 
function H is a collision-resistant hash function with the property that for any 
vectors a, b and scalars at, (3 it holds that H(aa + (3b) = i7(a) Q i7(6) /3 . Collision 
resistance implies (via standard arguments) that if one knows vectors a, b, c for 
which H(c) = H{a) a H{b) !i then it must be the case that c = aa + (3b. 

A concrete example of a homomorphic hash function is given by what we 
call the exponential homomorphic hash (EHH) scheme. Let G be a cyclic group 
of order p, and let the public key contain random generators gi ,..., g n € G. 
Define a function H on vectors v = (iq,..., v n ) € as 

H(u) = n U9?- (2) 

2 A symmetric-key analogue is also possible 1911 1 . but this allows only a (single) target 

to verify validity of vectors. 
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The homomorphic property is easily verified, and collision resistance is implied 
by the discrete logarithm assumption in G. 

Homomorphic hash functions can be used for network coding as follows. For 
each original vector v^ l \ the source S computes hi = H(i ;W); it then signs 
(hi ,..., h m ) (together with a unique file identifier fid) using a standard signa¬ 
ture scheme. The {hi} and their signature are then appended to every packet 
sent in the network!! A node can determine whether a vector w = (u || v) is 
valid by checking the signature on the {hi} (and the fid), and then verifying 
whether n™ i = h(v). In particular, for the EHH scheme hi = H(uW) and 
verification takes the form: 

f[hr = n(v) d ^f[ g v /. (3) 

i=l j =1 

The resulting network coding signature scheme can be proven secure without 
random oracles based on the discrete logarithm assumption [THE] . 

When using homomorphic hashing, the only change in the processing done by 
intermediate nodes is to verify the hash and forward the authentication informa¬ 
tion. However, the linear network coding operations performed by intermediate 
nodes are now done over the (large) field F = Z p . 

Homomorphic signature schemes fieimiei. Here, the full signature (and 
not just the hash) is homomorphic. Namely, the signature scheme has the prop¬ 
erty that for any vectors a,b and scalars a, (3, it holds that Sign(aa + (3b ) = 
Sign(a)“Sign(6)A The security property, roughly speaking, is that given the sig¬ 
natures of vectors w ^,..., it is only feasible to generate signatures on 

vectors in the linear span of ... ,w^ m \ The application to network coding 

is immediate: The source S signs each augmented vector and transmits each 
together with its signature Sign(u;W). An intermediate node / that receives 
a set of incoming vectors with their corresponding signatures will (i) verify the 
signatures (discarding any vector whose signature is invalid) and (ii) compute 
(using the homomorphic property) a valid signature on each outgoing vector 
that it generates. Thus, in addition to the normal network coding processing, 
intermediate nodes must now generate a signature on each outgoing packet. On 
the other hand, the per-packet communication overhead due to the signature is 
now constant rather than linear in m as in the case of homomorphic hashing. 

A concrete example of a homomorphic signature scheme (the BFKW scheme ) 
was given by Boneh et al. |B]; the scheme can be proven secure based on the CDH 
assumption in the random oracle model. We provide a description here for future 
reference. To begin, the source S establishes a public key as follows: 

1. Generate Q = (G,G r,p,e) where G,G t are groups of prime order p, and 
e : G x G —> Gt is a bilinear map. Choose random g \,..., g n , h £ G. 

2. Choose s <— Z p , and set / := h s . 

3. Let H : {0,1}* x Z^G be a hash function, modeled as a random oracle. 

4. Output the public key PK = (Q, H, g \,..., g n , h, f) and the private key s. 

3 In some settings, there may be alternate ways to distribute the {hi} authentically. 
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To sign a vector w = (u || v) £ Z™ +ra associated with the file identifier fid. the 
source S computes the signature 


a := 


i= 1 1=1 



(Note that the above can be viewed as applying a cryptographic operation [that 
depends on the secret key] to a homomorphic hash of w.) An intermediate node 
who knows PK can verify validity of a vector w = (it || v) with associated 
signature a by checking whether 


0, h) = e ( H (fid, i) Ui $, / 


l=i 


( 4 ) 


Upon receiving vectors u/ 1 ),..., with valid signatures ay, ..., oy, an inter¬ 
mediate node can generate a valid signature on any linear combination w = 
]>U a.i.w W by computing a := J][!=i (T t*‘ 


3 Network Coding over the Integers 

In this section we describe the idea of implementing network coding over the 
integers rather than over a finite field. This approach is essential for the crypto¬ 
graphic schemes we propose in the following sections, and also results in efficiency 
improvements for existing schemes as we describe here. 

Let us first motivate this departure from traditional network coding. Exam¬ 
ining the signature schemes described in the previous section, one can see that 
they result in significant performance penalties relative to basic (insecure) net¬ 
work coding, in terms of both communication and computation. The increase 
in communication is due to the fact that instead of working over a small (e.g., 
8-bit) field as in basic network coding, the cryptographic schemes work modulo 
a 160-bit prime p. Each file vector is thus augmented by 160-bit coordinates, 
rather than 8-bit coordinates as in basic network coding — a 20-fold increase 
in the communication overhead. This also impacts computation; for example, 
the time required to verify signatures when using the EHH scheme (cf. Eq. d3j ) 
is proportional to the bit-length of the exponents (i.e., the coefficients {u,;}). 
A similar effect can be observed in the time required to compute signatures at 
intermediate nodes when using the BFKW scheme (cf. Eq. (JJ])). 

To alleviate these performance costs, our approach will be to choose small 
integer coefficients as opposed to 160-bit scalars as in previous schemes. In more 
detail: We now view the file F transmitted by the source S as a sequence of 
vectors ..., with integer coordinates. (At this point we do not specify 
the dimension of these vectors or the range of the coordinates - these details 
will depend on the specific cryptographic scheme used). These vectors are aug¬ 
mented with unit vectors ..., u^ as described in Section ETT1 Intermediate 
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nodes will again compute outgoing packets as random linear combinations of in¬ 
coming vectors, except that now these combinations are taken over the integers 
and the coefficients cq are chosen uniformly from Q = {0,..., <7 — 1} for some 
small prime q (e.g., q = 257). (A hybrid approach using small integer coefficients 
but with linear combinations performed modulo a large prime is studied in Sec¬ 
tion EU In no case are the computations done modulo q.) We stress that the 
coordinates of the file vectors ..., v( m ' > need not lie in Q. 

Recall from Section I2TT1 that the usefulness of random linear network coding 
depends on the decoding probability, namely, the probability with which a re¬ 
cipient can correctly reconstruct the file transmitted by the source. Technically, 
this is the probability that the recipient collects m vectors whose u-portions 
form an invertible matrix U (see Eq. ([TJ)). For the setting described above, where 
operations are performed over the integers, we must re-analyze the decoding 
probability since existing bounds hold only when network coding is performed 
over a finite field. Fortunately, we show that working over the integers can only 
improve the decoding probability in a sense we make formal now. 

Lemma 1. Fix q prime. For any network, the decoding probability when net¬ 
work coding is performed over the integers with intermediate nodes choosing 
coefficients uniformly from Q = {0,..., q— 1}, is at least the decoding probability 
when network coding is performed modulo q (with intermediate nodes choosing 
coefficients uniformly from Q). 

Proof. Fix a sequence of coefficients oti £ Q chosen by the intermediate nodes 
during a run of the network coding protocol when operations are performed 
modulo q. Assume these coefficients lead to successful recovery of the file in this 
case. This means that the target receives m vectors such that the u-portions 
of these vectors give a matrix U with det(C/) 7 ^ 0 (computed modulo q). Note 
that det(U) mod q is unchanged if no modular reductions are performed in the 
network, but instead all modular reductions are ‘delayed’ and performed only by 
the target. But if U is an integer matrix, det([/) 7 ^ 0 mod q implies det(fJ) 7 ^ 0 
over the integers; thus, successful recovery would occur for these same coefficients 
if all operations were performed over the integers. □ 

The lemma implies that in order to get a good decoding probability when work¬ 
ing over the integers, it suffices to choose q such that the decoding probability 
when working modulo q is sufficiently good. This puts us back in the setting 
of standard network coding, where the required size of the underlying field is 
well-studied. Appropriate choice of q depends on the network topology, required 
fault tolerance, etc., but in most practical applications an 8 -bit q suffices. 

In fact, we expect that working over the integers with coefficients chosen from 
Q = {0,... ,q — 1} will induce a decoding probability that is noticeably better 
than working over a field of size q. If so, one could further save in bandwidth and 
computation by reducing the size of q. Another variant to investigate is choosing 
coefficients from the set {— q/2 ,..., q/2}. 

Coordinate growth. When we work over the integers without any modular 
reduction, the size of the coordinates of the vectors transmitted in the network 
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increases with each traversed hop. Specifically, each hop through some node 
increases the maximal coordinate of some vector in the network by a factor of at 
most min{mg, £q}, where £ is the in-degree of that node. (Note that even if t > m, 
the incoming vectors contain a set of at most m linearly independent vectors.) 
So, after L hops the first m coordinates each have magnitude at most ( mq) L 
(since the initial m coordinates in the augmented vectors sent by the source are 
0/1-valued), while the remaining coordinates have magnitude at most M(mq ) L , 
where M is the maximal size of coordinates in the original file vectors As 
we will see, by working over the integers we obtain bandwidth improvements (in 
typical networks) in spite of this coordinate growth. 

We remark also that an attacker can generate large valid packets by choosing 
large coefficients, thus countering some of the bandwidth gains achieved by hav¬ 
ing honest nodes use small coefficients. (Note, however, that nodes may be able 
to reject suspiciously large packets; e.g., those that deviate significantly from 
the average packet size received at the node or packets whose coefficients exceed 
an upper bound derived from the distance between the node and the source.) 
Network coding signatures cannot and do not prevent all forms of denial of ser¬ 
vice; their purpose is to prevent pollution attacks that are easy for an attacker 
to carry out yet have devastating effect. 


3.1 Improvements to Existing Schemes 

We consider here a “hybrid” variant where intermediate nodes choose small 
integer coefficients but operations are performed modulo a large prime p. This 
approach will allow us to significantly improve the performance of the schemes 
described in Section O while keeping their security guarantees intact. 

In the schemes described in Section E2T21 network coding is done modulo a large 
prime p. That is, the original vectors ..., v^ m l transmitted by the source are 
in Z p ; the coefficients for the linear combinations are chosen at random from Z p ; 
and the linear combinations are performed modulo p. Here we suggest to keep 
these schemes unchanged except that the random coefficients chosen by each 
intermediate node will be taken from the set Q = {0, ...,g — 1} for some small 
prime q (we stress that linear combinations are still computed modulo p). 

We first analyze the effect of this change on the decoding probability, showing 
that the decoding probability remains high as long as (1) p is a random fc-bit 
prime, and (2) m and the maximal path length L from the source to the target 
are negligible relative to 2 k (the latter is the case in our applications where k is 
typically 160 or larger). 

Lemma 2. Fix q prime. For any network, the decoding probability of the “hy¬ 
brid” scheme described above (where intermediate coefficients are chosen at ran¬ 
dom from Q = {0,... ,q — 1} and the linear combinations are performed modulo 
a random k-bit prime p) is at least the decoding probability when network coding 
is performed modulo q (with intermediate nodes choosing coefficients uniformly 
from Q), up to an O((Lm log mq)/2 k ) additive term. 
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Proof. As in the case of Lemma 03 we may assume that all linear combina¬ 
tions in the network are performed over the integers, and all modular reductions 
are performed only at the end by the target node. Fix some set of coefficients, 
chosen by all intermediate nodes, for which reconstruction of the file (when op¬ 
erations are performed mod q) succeeds. Letting U* denote the integer matrix 
computed at the target, this means that det(Z7*) yf 0 mod q which, in turn, im¬ 
plies det(/7*) y^ 0 (over the integers). We now show that except with probability 
0(Lm\ogmq/2 k ) over choice of p , it also holds that det([/*) y^ 0 mod p. 

Let d denote the bit-length of det ([/*). The number of primes of length k 
dividing det (U*) is at most d/k, and the number of primes of length k is 0(2 k /k). 
Thus the probability that p divides det([/*) is at most 0(d/2 k ). It remains to 
bound d = 0(log | det(C/*)|). 

The matrix U* is composed of the M-portion of vectors received by the target. 
As seen before, the M-coordinates of such vectors have magnitude at most (mq) L , 
where L is the maximal path length from the source to the target. So U* is 
an to x to matrix with each entry having magnitude at most ( mq) L . Thus, 
det(Z7*) < m\{mq) Lrn < ( mq) m ( L+1 ' > and d = 0(Lm\ogmq). □ 

We proceed to examine how using small integer coefficients can improve the 
performance of the network coding signature schemes discussed in Section 12.21 

Saving bandwidth. In the two schemes reviewed in Section m all vectors 
transmitted in the network are pre-pended with a u-portion consisting of m 
coordinates each 160 bits in length. (For simplicity, we assume here that p is a 
160-bit prime.) Using our approach, all vectors are pre-pended with a u-portion 
consisting of m integer coordinates each of whose length is at most 160 bits (since 
we are still performing reduction modulo p). On average, however, the length of 
these coordinates can be much smallei'0 For example, assume the maximum path 
length is 16 hops and u-coordinates increase by at most 10 bits per hop (this is the 
case, e.g., if £ = 4 and q = 253). After the first hop the u-coordinates are at most 
10-bits long; after the second hop they are at most 20-bits long, etc. Thus, in the 
worst-case we use (on average over all hops) 80 bits per coordinate which reduces 
the bandwidth of the it-components by a factor of two as compared to the case 
when intermediate nodes choose coefficients from Z p . Better improvements are 
obtained when average path lengths are shorter; even when average path lengths 
are longer, our approach can never perform worse than the basic approach. 

Saving computation. Reducing the bit-length of the w-coordinates yields com¬ 
putational savings as well, due to the use of shorter exponents during verification 
(cf. Eqs. (0 and 0). A major improvement is also obtained in the computation 
required by intermediate nodes in generating the signatures of their outgoing 
vectors when using the BFKW scheme, exactly due to the use of small coef¬ 
ficients. This gives a 20-fold improvement for this operation, regardless of the 
average path length in the network. See also the following remark. 

4 Note that the coordinates of the u-portion of the vectors are not affected by the use 

of small coefficients; in both cases these are always 160-bit values. 
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Remark 1. Signature verification can be done on an opportunistic basis by in¬ 
termediate nodes (e.g., for a random subset of vectors). In contrast, signature 
computation must be done by all intermediate nodes for each outgoing packet. 

4 An RSA-Based Network Coding Signature Scheme 

In this section we present an RSA-based network coding signature scheme that 
enjoys a proof of security in the random oracle model under the RSA assump¬ 
tion, and relies on the ability to perform random linear network coding over 
the integers as described in Section [3] The scheme is similar to the BFKW 
scheme and adapts ideas from [3|I4| in the same way the BFKW scheme bor¬ 
rows from m- We construct a homomorphic signature scheme by applying a 
multiplicatively homomorphic signature to a homomorphic hash of the vector 
being signed. The homomorphic hash we use is similar to the EHH scheme ex¬ 
cept that we work modulo an RSA composite rather than modulo a prime. We 
take as our multiplicatively homomorphic signature the “textbook RSA” scheme 
where a signature on x is just x d mod N. The resulting scheme is presented in 
Section 14.11 

In order to use the resulting scheme for network coding, it is essential that the 
linear operations being performed by the nodes “work” relative to an unknown 
modulus (that arises in our case because <f>(N) is unknown). To achieve this, we 
have intermediate nodes perform network coding over the integers. We describe 
this in detail in Section 14.21 

4.1 An RSA-Based Homomorphic Signature Scheme 

We start by defining an RSA-based homomorphic signature scheme denoted Bsig. 

• Public and secret keys: Let A be a product of two safe primes; in particular, 
the subgroup of quadratic residues QIZn is cyclic and random elements of QIZn 
are generators of this subgroup with overwhelming probability. The public key is 
( N , e, gi,.. ., g n ) and the secret key is d, where ed = 1 mod <f>(N) and <?i,..., g n 
are random generators of QIZn- 

• Signature generation: The signature on v = (v \,..., v n ) £ Z" is given by 



(5) 


Verification is done in the obvious way. It is easy to see that this scheme is 
homomorphic: for any v. v' £ Z™ and a, /3 £ Z, we have Bsig(au + (3v ') = 
(Bsig(u))“ • (Bsig(u'))^. 

4.2 An RSA-Based Network Coding Signature Scheme 

Here we describe how the above scheme Bsig can be extended to give a network 
coding signature scheme Nsig. We first review the underlying network coding 
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being performed, focusing on details not already covered in Section |3l The file 
held by the source S' is a sequence of vectors th 1 ), ..., v^ m \ where each £ Z™ 
for some value n. Note that once the size of the file and the number m of vectors is 
fixed, a lower bound |u| on the bit-length of each of the vectors t)W is determined, 
and n can take on any value between 1 and |i>|. As we will see, smaller values 
of n reduce communication while larger values of n reduce computation (very 
often n = 1 will provide the most practical trade-off). 

As usual, before sending the vectors to the network, the source pre-pends 
them with unit vectors vS 1 ^ thus producing ... ,w^ m ^ € Z m+n . Everything 
else is carried out as already described in Section [3] in particular, intermedi¬ 
ate nodes generate random linear combinations (over the integers) of incoming 
packets, using coefficients chosen uniformly from Q = { 0 ,..., q — 1 } for prime q. 

Let L be an upper bound on the path length from the source to any target. 
(Looking ahead, the Nsig scheme defined below may reject packets that traverse 
more than L hops.) Given L we define a bound B = ( mq) L which represents the 
largest possible value of a u-coordinate in any (honestly generated) vector; cf. 
Section El If M denotes an upper bound on the magnitude of the coordinates of 
the initial vectors t^ 1 ),..., v^ m \ then the maximal magnitude of any coordinate 
in an honestly generated vector is B* = BM. 

We now introduce our scheme Nsig. 

• Parameters: m,n, M, B, and B*. 

• Public and secret keys: The public key (N,e,g i,... ,g n ) and the secret key d 
are as in Bsig, except that e is chosen to be prime with e > mB* (for efficiency 
reasons e can be chosen to have low Hamming weight). In addition, the scheme 
uses a public hash function H : { 0 , 1 }* —> QIZn that will be modeled as a 
random oracle. 

• Signature generation by source S: On input a file given by m vectors 
t^ 1 ),..., v( m ' > £ Z", the source S generates the augmented vectors tyW = 

|| xjM £ jm+n usua i W ay. c; chooses random fid £ { 0 , l} fc , and 

computes hi = H(i, fid) for i = 1 The signature on each vector 

w = (ui, ■ ■ ■, u m , Vi,..., v n ) is: 


Nsig(tc) 


(flvn*?) modJV - 


( 6 ) 


S transmits each along with its signature and fid. 

• Signature verification: Given w = u || v = (tii,..., u m , v \,..., v„) £ Z m+n , 
a file identifier fid, and a signature a, verification is done as follows. Reject 
immediately if any of the u-coordinates is negative or larger than B, or any of the 
y-coordinates is negative or larger than B*. Otherwise, compute hi = H(i, fid) 
for i = 1 ,,m and accept the signature if and only if 


a e 


n^n^modAr. 

*=1 3 =1 


(7) 
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(An optimized batch verification procedure for testing multiple incoming vectors 
is presented at the end of this subsection.) 

• Signature combination at intermediate nodes. Upon receiving 
associated with the same fid and with valid signatures a 1 ,..., <Jt, an intermediate 
node proceeds as follows. It first discards any uW having a u-coordinate larger 
than B/mq or a u-coordinates larger than B* /mq^ For simplicity we continue to 
denote the non-discarded vectors by w^\ ..., . The intermediate node then 

chooses random coefficients a\,... ,ae G Q, sets w = o^w^, and computes 

the signature on w as: 

i 

a = a*' 1 mod N. (8) 

i=l 

We prove security of this scheme in the following subsection. First, we compare 
the performance of our scheme to the original BFKW scheme and the variant 
BFKW scheme (using small integer coefficients) described in Section 13.11 

Bandwidth. The lengths of the coordinates of the vectors w transmitted in 
the network increase by at most s = log (mq) bits for each traversed hop. Thus, 
after t hops each w-coordinate has bit-length at most ts. If s = 10, for example, 
then it will take 32 hops before the total communication overhead due to the 
w-coordinates exceeds that of the original BFKW scheme (where u-coordinates 
are always of size 160 bits). For most networks, where maximum path-lengths 
are expected to be much less than 32 hops, Nsig therefore incurs lower overhead. 
Comparing Nsig to the variant BFKW scheme described earlier in this work, 
we see that the two schemes have the same overhead until coordinates reach 
160 bits; after that the variant BFKW scheme performs better (since in Nsig 
coordinates keep growing while in BFKW they do not). This, however, does not 
take into account the fact that in Nsig the u-coordinates also increase while in 
BFKW they do not. Fortunately, we can choose n to be small (e.g., n = 1), thus 
making this overhead insignificant (see more below regarding the choice of n). 

Computation. The most critical operation is signature generation at interme¬ 
diate nodes (see Remark [L] in Section PT7TI) . In Nsig this operation is extremely 
efficient since the exponents a* in Eq. © are small (say, 8 bits each). Thus, we 
can expect this operation to be roughly 20 times faster in Nsig than in the orig¬ 
inal BFKW scheme. (The variant BFKW scheme is expected to perform about 
as well as Nsig.) Verification is more expensive. Looking at Eq. (ffl) . we see that 
verification in Nsig requires an exponentiation using (|u| + (to + n) log I3)-bit 
exponents and a (log to + logR + |u|/n)-bit exponent (i.e., the bit-length of e). 
Since the impact of n is more significant with regard to bandwidth than com¬ 
putation, in most cases it makes sense to choose n = 1. The resulting cost of 
verification is still better than that of the original BFKW scheme due to the 
pairing operation and the cost of hashing onto the bilinear groups in the latter. 
The cost of a hashing operation in this case is equivalent to a full exponentiation 

5 These bounds are more restrictive than those required by signature verification, and 
are intended to ensure that signature verification will succeed at the next hop. 
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and it is needed for computing each of the m values hi = fF(fid,z) (and also to 
compute the generators g ±,..., g n in the case that one implements BFKW with 
fixed-size public key). In contrast, in the case of Nsig the computational cost of 
the hashing operations is negligible. Moreover, if one uses n = 1 the resultant 
public key has a single generator while in BFKW one needs |u|/160 of them (e.g., 
for a 4 Kbyte |v|, BFKW requires 200 generators). The cost of computation can 
be further improved by resorting to a batch verification of incoming vectors as 
described next. 

Batch verification. The most expensive operation in the Nsig scheme is the 
verification of incoming signatures. Here we show that instead of verifying each 
incoming signature it suffices to verify just one outgoing signature. The probabil¬ 
ity that the verification of this outgoing vector succeeds but one of the incoming 
vectors was invalid is at most 1/g. This is fine in most cases since even if a node 
forwards an invalid vector this will be caught with high probability by subse¬ 
quent nodes (i.e., the probability that t consecutive honest nodes do not discover 
a forgery is at most 1/V). To achieve this optimization we modify the actions 
of intermediate nodes as follows. 

Upon receiving u/ 1 ),... ,w^ associated with the same fid and with alleged 
signatures ay,..., an, intermediate node I discards any vector that has too large 
coordinates as described above. Then, I generates one outgoing vector as usual, 
i.e., chooses random coefficients oq,.. .,cte £ Q and sets w = ■ It 

then sets a = n!=i a i i moc I N and verifies (using Eq. Q) that Nsig(ic) equals 
a or —a. If this verification succeeds, then no further verifications are needed. 
That is, I outputs w on one of its outgoing edges and proceeds to compute other 
outgoing vectors as in the case that all incoming vectors and their signatures 
were valid (with the usual random linear combinations and using Eq. dHJ) to 
generate outgoing signatures but without additional verifications). In this way, 
the number of signature verifications at any intermediate node is 1 regardless 
of the number of incoming or outgoing vectors. (Note: If the above verification 
of outgoing w fails, / may decide to discard its incoming vectors or test each 
one separately to find the valid ones - the important point is that under normal 
operation, i.e., without adversarial activity, a single verification suffices). 

A proof of correctness of the above batch verification technique follows [5] and 
is presented in the full version. 

4.3 Proof of Security 

We now prove security of Nsig relative to the definition given in [6 

Theorem 1. Under the RSA assumption, Nsig is a secure network coding sig¬ 
nature scheme when the hash function H is modeled as a random oracle. 

Proof. Given a forger T attacking Nsig, we build an algorithm S that solves the 
RSA problem. Here S stands for simulator and also for source since S will be 
simulating the actions of the source being attacked by T. 
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Algorithm S receives input N,e,C where N,e are distributed as in an Nsig 
public key and C Gr QIZn- Its goal is to output C l ! e mod N. (Note that if S 
computes C 1 / e mod N for C Gr QIZn with non-negligible probability, this con¬ 
tradicts the standard RSA assumption where C is chosen uniformly from Z*y.) 
Algorithm S begins by choosing i 0 Gr {1,..., n} and then setting g io := C. For 
i f io, algorithm S chooses r* Gr QTZn and sets </.; := rf mod N. Then S calls 
T on the public key ( N , e, g ±,..., g „). 

T chooses a file, represented as a set of vectors v ^\... ,v^ m \ and requests 
a signature on it. In response, algorithm S chooses Gr QJZn and 

fid G {0, l} fe , and then sets (using the programmability of the random oracle H) 

hi = f H(i, fid) := erf g- Vj mod N. (9) 

i=i 

(If fid was used previously to sign another file, S aborts. This occurs with negligi¬ 
ble probability and we ignore it from here on.) Finally, S gives to T the signature 
< 7i on the augmented vector = u W || t;W (for i = 1 ,... , to), along with fid. 
It is easy to see that signatures are distributed exactly in the real experiment. 

Say T outputs a forgery, i.e., a file id fid*, a vector w* £ spanjw^ 1 ),..., 

(where {zT;^ 1 1 ,..., is the unique set of augmented vectors signed using 

fid*), and a valid signature a* = Nsig(u>*) on w*. We show how S can use this 
to solve its given RSA instance. 

Denote w* = u* || v* — (wf,..., u^, ..., t>*), and define the vector 


z* = w* — u*w^. ( 10 ) 

2=1 

Note z = (0,..., 0, zi, ..., z n ); that is, its first to coordinates are all zero. More¬ 
over, since w* ^ spanjiZ^ 1 ),... , at least one of the values Zi is non-zero. 

With probability at least 1/n we thus have z, 0 f 0, and we assume this to be 
the case from now on. 

By definition of z* and the homomorphic property of Nsig we have: 


Nsig( 2 *) = Nsig(ic* — u*w^) 

2—1 

= Nsig(u;*) Nsig(«)^) _u * = CT * II a i Ui m °d N. (11) 
2 = 1 2=1 

On the other hand, we can also represent Nsig(z*) as 
Nsig(z*)= 

= ( C Zi °) 1/e {g ?) 1/e = (C Zi o ) x ' e if mod N. (12) 

ij^io *#*o 
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Combining Eqs. CD and (TH? 1 ) we get that 

(C Zi °) 1/e = a* crp' mod N , 

i=l ijti 0 

from which S can compute a value x such that x e = C Zi o mod N. Using a 
standard trick, S can then compute C 1 / 6 mod N provided that gcd(zj 0 ,e) = 1. 
But this is the case since e > mBM is prime and 

-mBM < v* 0 -J2Zi u *Mo = ~h> = ^ MB - 

(Since w* passes verification we have 0 < u* < B and 0 < v* < MB-, it always 
holds that 0 < < M.) □ 

Remark 2. The above proof uses the fact that e is larger than the coordinates of 
valid vectors. Indeed, if coordinates larger than e are allowed then given a valid 
vector w = u \ \ v with signature a an attacker can output the forged signature 
cr' = cr ■ g± on the vector w' = u\\v' with v' = v + (e, 0 ,..., 0). 

5 Homomorphic Hashing Modulo a Composite 

Network coding signatures based on homomorphic hashing can offer significant 
computational advantages relative to constructions based on homomorphic sig¬ 
nature schemes since, when using the former, a node that chooses not to verify an 
incoming vector (cf. Remark [T| need not perform any cryptographic operations. 
On the other hand, constructions based on homomorphic hashing consume more 
bandwidth since nodes now need to obtain the (authenticated) hash values of 
the original file vectors. If delivery of the hash values to nodes can be done in 
some out-of-band fashion, however, this drawback is mitigated. 

Here we introduce a homomorphic hashing scheme, denoted Hjv, that is similar 
to the EHH scheme described in Section 12.21 but where operations are performed 
modulo a composite N. This results in the homomorphic properties holding over 
a group of unknown order; hence this scheme can only be applied when the un¬ 
derlying network coding is done over the integers. Hjv has better computational 
efficiency than the EHH scheme (over prime-order groups) from Section 12.21 al¬ 
though produces larger hash values, it requires a smaller public key than the 
EHH scheme. 

In this section we once again assume linear network coding being performed 
over the integers as described in Section |3] Namely, the file to be transmitted 
is represented by vectors ,..., £ Z n , and intermediate nodes choose 

coefficients uniformly from a set Q = {0,q — 1} (for some small prime q) and 
compute all linear combinations without any modular reduction. 

Let N be the product of two safe primes so that the group QIZn of quadratic 
residues modulo N is cyclic, and let g\, ■ ■ ■ ,g n be generators of QIZn- For v = 
(i>i,..., v n ) £ Z ra , define 

H-n{v) = g v / mod N. 

3=1 
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This is a homomorphic hash function that is collision resistant if factoring N 
is hard (proof omitted). Thus, can serve as a basis for a network coding 
signature scheme as discussed in Section 12.21 In particular, a node receiving a 
vector w = («i,..., u rn , v \,..., v n ) can verify it by checking whether 

ti;- 1 = H n (v) = f mod N. (13) 

i =1 3 =1 

where hi = H^r(i;^- ) ), i = 1,..., m. Below we show a batch verification optimiza¬ 
tion that allows an intermediate vector to verify all of its incoming vectors with 
a single application of Eq. (THil) . 

Bandwidth considerations for this scheme, which uses integer coefficients that 
grow over time, are similar to those of the RSA-based scheme from Section HJ An 
additional benefit of Hjv is that there is no need to determine an a priori bound 
on these coefficients. As in the case of the RSA-based scheme from Section 0] 
one way to limit the effect of coordinate growth on the total communication is 
to set n = 1. As we now discuss, this not only reduces bandwidth overhead but 
also improves computational performance significantly. 

Fix n = 1 so that each block of information j s a single (long) integer. 
Choosing N appropriately, we can take 2 as a generator of QR-n , thus obtaining: 

Hjv(i') = 2 U mod N. 

This achieves the most salient advantage of the Hjy scheme: fast exponentiation. 
Another advantage of this homomorphic hash is that it considerably improves the 
size of the public parameters relative to the EHH scheme. To see this, observe 
that in the EHH scheme the total length of the set of generators g \,..., g n 
included in the public parameters is (at least) nlogp which is (at least) as large 
as each information vector v^\ Moreover, the number of generators is usually 
very large; e.g., for vectors of size 4KB and 160-bit p the EHH scheme 
needs 200 random generators^ In the case of Hat, on the other hand, only one 
generator is needed and furthermore this generator can be fixed to 2; the public 
parameters need only include N. 

Batch verification. The use of Hjv for network coding can be further optimized 
by using batch verification at intermediate nodes similarly to the procedure 
described in Section W71 1 for the Nsig signature. Specifically, instead of verifying 
each incoming vector using Eq. (1131) , an intermediate node can just generate one 
outgoing vector as usual (i.e., as a random linear combination over {0 ,..., q— 1} 
of the incoming vectors) and then apply Eq. (TT31) to the resultant vector. It can 
be shown that the probability that this single verification passes but one of the 

6 Choose N = P 1 P 2 pi = 2 p[ + 1,P2 = 2p' 2 + 1, and Pi,P 2 ,Pi,P 2 all prime; p'i,P 2 = 
3 mod 8; and pi,P 2 = 7 mod 8. 

' The size of the public parameters can be reduced by using a hash function to compute 
the generators “on the fly.” Besides necessitating the use of the random oracle model, 
this also introduces additional computational overhead. 
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incoming vectors was invalid (not in the span of ... ,w^ m ^) is at most 1 /q. 
The probability that t consecutive nodes will be foiled to accept invalid vectors is 
at most 1 /q l . Thus a single verification per intermediate node suffices regardless 
of the number of incoming or outgoing vectors. 

In all, we have shown that the homomorphic hashing scheme H^v leads to a 
computationally efficient network coding signature scheme whose security can be 
proven based on the factoring assumption in the standard model (and assuming 
the security of the signature scheme used to sign the hash values hi,... , h n ). 
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Abstract. Network coding is a method for achieving channel capacity in 
networks. The key idea is to allow network routers to linearly mix packets 
as they traverse the network so that recipients receive linear combinations 
of packets. Network coded systems are vulnerable to pollution attacks 
where a single malicious node floods the network with bad packets and 
prevents the receiver from decoding correctly. Cryptographic defenses 
to these problems are based on homomorphic signatures and MACs. 

These proposals, however, cannot handle mixing of packets from multiple 
sources, which is needed to achieve the full benefits of network coding. 

In this paper we address integrity of multi-source mixing. We propose a 
security model for this setting and provide a generic construction. 

1 Introduction 

Network coding [dllfij is an elegant technique that replaces the traditional “store 
and forward” paradigm of network routing by a method that allows routers to 
transform the received data before re-transmission. It has been established that 
for certain classes of networks, random linear coding is sufficient to improve 
throughput HU. In addition, linear network codes offer robustness and adapt¬ 
ability and have many practical applications (in wireless and sensor networks, 
for example) fTO]. Due to these advantages, network coding has become very 


popular. 


On the other hand, networks using network coding are exposed to problems 
that traditional networks do not face. A particularly important instance of this 
is the pollution problem: if some routers in the network are malicious and for¬ 
ward invalid combinations of received packets, then these invalid packets get 
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mixed with valid packets downstream and quickly pollute the whole network. In 
addition, the receiver who obtains multiple packets has no way of ascertaining 
which of these are valid and should be used for decoding. Indeed, using even 
one invalid packet during the decoding process causes all the messages to be de¬ 
coded wrongly. For a detailed discussion of pollution attacks, we refer the reader 

to | I4H9I12| . 

To prevent the network from being flooded with invalid packets, it is desirable 
to have “hop-by-hop containment.” This means that even if a bad packet gets 
injected into the network, it is detected and discarded at the very next hop. Thus, 
it can be dropped before it is combined with any other packets, preventing its 
pollution from spreading. 

Hop-by-hop containment cannot be achieved by standard signatures or MACs. 
As pointed out in [Tj , signing the message packets does not help since recipients 
do not have the original message packets and therefore cannot verify the signa¬ 
ture. Nor does signing the entire message prior to transmission work, because 
it forces the recipient to decode exponentially many subsets of received pack¬ 
ets to find a decoded message with a consistent signature. Thus, new integrity 
mechanisms are needed to mitigate pollution attacks. 

Previous Work. Security of network coding has been considered from both 
the information-theoretic and cryptographic perspectives. In the former, the ad¬ 
versary is modelled as having control over a limited number of links in the 
network. Such approaches, though useful for wireline networks, have limited 
application in wireless networks. For a detailed discussion of these techniques, 
see e.g. unanj. Cryptographic techniques have also been proposed, e.g. in 
[7117113121 . These authors construct digital signatures for signing a linear sub¬ 
space. If V is a subspace and cr its signature, then there is a verification algorithm 
which accepts the pair (v, cr) for all vef, but it is difficult to construct a vector 
y ^ V for which the pair (y, cr) verifies. An alternative approach is to use a MAC 
(instead of a signature) for integrity of a linear subspace; see [lll8j . 

While the signature and MAC schemes in iaiaiai4in are elegant, they are 
quite limited: they only allow routers to combine vectors from a single sender. 
(Furthermore, the constructions of |7fl7119| require a new public key to be 
generated for each file, thus hurting efficiency.) Traditional network coding as¬ 
sumes a network where many senders simultaneously send messages and network 
routers linearly combine vectors from multiple senders. This setting is essential 
in showing that network coding can improve the efficiency of 802.11 wireless 
networks [T5] , 

Our Contribution. Our goal is to construct a signature mechanism that pro¬ 
vides integrity when network routers combine packets from many sources. This 
problem is considerably harder than the single source problem. First, defining 
security is more difficult. It is necessary to model “insider” attacks where the 
attacker controls network routers as well as some senders. The attacker’s goal is 
to generate valid signatures on mixed packets; after decoding these packets the 
recipient believes that an honest sender sent a message M* that was never sent 
by the honest sender. 
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More precisely, if there are s senders in the network, we allow the attacker to 
control s — 1 of them. Furthermore, the attacker can mount a chosen message 
attack on the single honest sender. The attacker’s goal is to generate a mixed 
packet with a valid signature that after decoding corresponds to an existential 
forgery on the single honest sender. 

In Section[3]we show that a natural generalization of the single-sender security 
model in [1] to the multi-sender setting results in a model that cannot be satisfied. 
We do this by constructing a generic attack against an abstract multi-source 
network coding signature scheme. In Section [|] we present a security model that 
captures the constraints of the multi-sender problem. Our model retains the 
desirable properties of the single-source model, such as hop-by-hop containment 
of forged packets, and is achievable. 

In Section [5] we present a construction satisfying our security model. We give 
a generic construction from a new primitive called a vector hash, which captures 
the properties of homomorphic hashing that are necessary to produce secure 
signatures. In the full version of this paper [2] we show how to instantiate the 
construction based on the discrete logarithm assumption. We also prove a lower 
bound that shows that our model necessitates a relatively space-inefficient con¬ 
struction; our discrete log scheme (asymptotically) achieves this lower bound. 

2 Network Coding 

We refer the reader to fTBJ for a detailed introduction to network coding. Here we 
present a brief overview for completeness; this description describes the operation 
of a network coding system and is independent of any security model. We model 
a network as a directed graph consisting of a set of vertices (or nodes) V and a 
set of edges E. We assume the graph is connected. A node that only transmits 
data is called a source node. We start with the basic model, in which one source 
wishes to transmit one file F through the network. The source interprets the 
data in F as a set of m vectors vy,..., v m in an n-dimensional vector space over 
a finite field F p . (The prime p and the dimensions n and m are fixed parameters 
in the system.) We sometimes refer to individual vectors as blocks or packets. 
The source then appends a unit vector of length m to the vectors vy to create 
m augmented vectors Vi,..., v m given by 

Vi = (—v—,0,..., 0,1,0,..., 0) G F” +m . 

The augmented vectors comprise the data to be transmitted through the net¬ 
work. We call the first n entries of the vector v,; the data component and the last 
m entries the augmentation component. 

The “coding” part of network coding works as follows: an intermediate node in 
the network receives some set of vectors wj,..., wy, chooses £ random elements 
Pi G F p , and transmits the vector y = Yfi=i A w i along its outgoing edges. The 
key property of the augmentation is that the augmentation component contains 
exactly the linear combination coefficients used to construct y. That is, we know 
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that y = 5Z”=i Vn+iVi even though the intermediate node may never see the 
Vj. This property allows any node that receives a set of m linearly independent 
vectors yi,..., y m to recover the original Vj. Specifically, if we let D be to x n 
matrix whose ith row consists of the data component of y,;, and A be the m x to 
matrix whose ith row consists of the augmentation component of y,;, then the 
rows of A~ 1 D are exactly the initial vectors v,;. 

Since network coding consists of linearly combining vectors, the subspace 
spanned by the (augmented) vectors of a file remains invariant under network 
operations. Hence we can equivalently consider a file to be represented by the 
subspace spanned by the vectors that comprise it. 

Notation. We use n to denote the dimension of the data space and to to denote 
the dimension of a vector subspace that represents a single file. The number of 
files in the system is denoted by /. For v £ F™ +f , we will use v to denote the 
data component of v, i.e., the first n coordinates of v, and /3 V to denote the 
augmentation component of v, i.e., the remaining l coordinates. When we use 
a vector space V as input to or output of an algorithm we assume that V is 
described by an explicit basis {vi,..., v^}. Such a basis is properly augmented 
if for i = 1,... ,£, the augmentation component /3 V . is the unit vector e,; with a 
1 in the *th place. 

We will refer to the augmented vectors that the source wishes to transmit as 
primitive vectors. Here, “primitive” alludes to the fact that these vectors have 
not been mixed with any other; their augmentation components are unit vectors. 
Aggregate vectors, on the other hand, refer to vectors that have been formed as 
a result of linearly combining primitive or other aggregate vectors. 

2.1 Multiple Sources, Multiple Files 

In general networks may have multiple sources, each of which can transmit mul¬ 
tiple files into the network. We now describe this situation, assuming that all 
nodes in the network are honest. In principle the network coding setup is the 
same as in the single-source situation described above, but there is some more 
bookkeeping to do. This bookkeeping is implicit in previous work that considers 
multiple sources (e. g . my here we give an explicit description that we will use 
in our discussion of security. The complication arises from the fact that the inter¬ 
mediate nodes wish to combine vectors from files produced by different sources, 
but each source knows nothing of what the other sources are doing. 

In the single source case, each file is associated with a file identifier id. The 
identifier allows the receiver to group together packets that belong to the same 
file. This prevents, for example, delayed honest packets from a previous file 
transmission from being decoded along with the current file’s vectors. Hence 
each vector (primitive or aggregate) that traverses the system carries with it the 
identifier of the file it belongs to. 

In the multi-source case, the file identifier id plays an even more crucial role 
— it allows the intermediate nodes to combine vectors arising from different 
files. In this scenario, an aggregate vector may be associated with multiple files, 
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and the identifier attached to an aggregate vector v must carry with it the 
identifiers of all of the files whose vectors went into making v. Upon receiving 
two vectors, where each vector contains a (probably different) list of identifiers 
id, an intermediate node will need to “merge” the lists of identifiers to a common 
list and adjust the two vectors’ augmentation components so that they can be 
linearly combined. 

For example, suppose a node receives two vectors Vi, V 2 G F” +m with identi¬ 
fiers idi and id 2 , respectively. Splitting v, into its data and augmentation compo¬ 
nents, we write V; = (v;, a;). If id 1 = id 2 then the vectors come from the same file 
and the situation is analogous to the single source case and no additional adjust¬ 
ment is needed. However, if id 1 7 ^ id 2 then the vectors came from different files 
and we must introduce additional augmentation before we can linearly combine 
the vectors. In this case we define vj = (vi,ai,0) and v 2 = (v2,0,a2) G F” +2m , 
where 0 denotes a length-m zero vector. Thus when we compute a linear combi¬ 
nation v = avj + bv 2 , the data components are mixed together but the augmen¬ 
tation coefficients remain separate. We can then use the identifier id = (id 1 , ids) 
to indicate which set of augmentation coefficients correspond to which file. 

More generally, we define an algorithm Merge that merges the lists of identi¬ 
fiers contained in aggregate vectors and adjusts the vectors’ augmentations. This 
algorithm is intrinsic to the multiple-source setting: the algorithm does not itself 
linearly combine vectors, but rather it prepares aggregate vectors (coming from 
different sources, made up of different files) to be mixed together. If v G Fp +m ^ 
is an aggregate vector, we continue to call the first n entries of v the data com¬ 
ponent; we call the rest of v the augmentation component, and we divide the 
augmentation component into / augmentation blocks of length to. (Here and in 
the remainder of the paper we assume for simplicity the dimension to is the same 
for each file and is publicly known; the generalization to variable dimension is 
straightforward.) 

Algorithm 1. (Merge) 

Input: lists of identifiers idi, id 2 of lengths fi, f 2 , respectively, with no repeated 
entries, and vectors W; G F” +m ^ for i = 1,2. 

Output: vectors wj, wj, G Fand a list of identifiers id of length /'. 

1. Let id be the list whose entries are the union of the elements of idi and id 2 , 
ordered in some pre-determinecl way (e.g. lexicographically). Let f be the 
length of id . 

2. For i in 1,2, define wj G F” +m ^ by setting the data component of wj 
equal to the data component of w t , and for j in 1 ,..., f, setting the jth 
augmentation block of wj as follows: 

— If the jth element of id is the /cth element of id.;, the jth augmentation 
block of wj is equal to the fcth augmentation block of w;. 

— If the jth element of id is not an element of id;, the jth augmentation 
block of wj is 0. 

3. Output the list id and the vectors wj, wj,. 
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The intermediate node can now compute a random linear combination y of the 
and w ' 2 and use the list id as the identifier component of the signature on 
y. (In the example above we executed this algorithm on two vectors each with 
an identifier list of length /,; = 1.) 

We also define an algorithm called MergeSpaces that uses the Merge algorithm 
to combine two files described as vector spaces. 

Algorithm 2. (MergeSpaces) 

Input: disjoint lists of identifiers idi, id2 and two vector spaces V = 
span(vi,... ,v fc ) C F£ +fe and W = span(wi,.. .,w f ) C F”+C 
Output: a subspace Z C Vp +k+l and an identifier id . 

1 . Let B be the set of nonzero vectors produced by 


Merge(idi, id 2 , Vi, 0),..., Merge(idi, id 2 , v*,, 0), 
Merge(idi, id 2 ,0, Wj),..., Merge(idi, id 2 ,0, w^). 


2. Let id be the identifier output by any of the calls to Merge in Step (1). 

3 . Output Z = span(B) and id . 

By applying MergeSpaces repeatedly using concatenated lists of identifiers, the 
algorithm generalizes to take any number of vector spaces and identifiers as 
input. The decoding operation works as before: given a set of vectors whose 
(merged) augmentation components form a full-rank matrix, we can recover the 
original data vectors by inverting this matrix. 


3 Signatures and File Identifiers 

For single sources, a network coding signature scheme consists of three algorithms, 
Setup, Sign, and Verify, whose functionality correspond to the usual notions for 
a signature scheme. In this setting, the Sign algorithm produces signatures on a 
vector space , and the Verify algorithm checks whether the signature is valid on 
a given vector. In addition, both Sign and Verify take as additional input a file 
identifier id, which binds a signature to a file. Informally, the correctness condition 
is that if a is a signature on a vector space V with identifier id, then for all v £ V, 
Verify(id,v,cr) outputs “accept.” (For formal definitions, see [U Section 3 . 1 ].) 

For multiple sources, we need to add an additional algorithm Combine that 
will be used by intermediate routers to produce signatures on vectors that are 
linear combinations of vectors from different files. More precisely, Combine takes 
as input two tuples (v*, id;, ay, cij) for i = 1,2, where v,; are vectors, id, are 
(lists of) identifiers, ay are signatures, and ai are network coding coefficients. 
The algorithm outputs a signature a'. The correctness condition is that if ay 
is a valid signature on v, with identifier idi for i = 1,2, then a' is a valid 
signature on aiv^ + a-2 vf, with identifier id , where vj, v' 2 , id are output by 
Merge(idi, id 2 , vi,V 2 ). 
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In the single-source setting, the Sign algorithm takes id as input. Thus, a 
vector v carries a pair (id, a) where id is the file identifier chosen arbitrarily, and 
a is generated by Sign. In the multi-source case however, allowing senders to pick 
file identifiers gives them the ability to frame other users in the system, so that 
receiver Bob can be made to believe that user Alice sent him a packet which, in 
fact, Alice did not. In most network coded systems with multiple senders, such as 
Bit Torrent 0, insider attacks form the real threat, so this attack has significant 
practical implications. Fortunately, this attack can be thwarted by enforcing 
that the file identifiers be cryptographically verifiable. In the subsequent sections, 
we will formalize these notions. We first describe the attack, and then use the 
intuition gained from the attack to construct a framework that can circumvent it. 

3.1 Generic Attack (For Arbitrary File Identifiers) 

Here we construct an attack against an abstract multi-source network coding 
signature scheme that consists of the algorithms Setup, Sign, Combine, Verify 
discussed above. We make no assumptions about these algorithms beyond their 
functionality. We show that it is impossible to achieve hop-by-hop containment 
if the identifier id is chosen arbitrarily by the sender and is given as input to the 
Sign algorithm. We construct a generic attack in which an intermediate node is 
fooled into accepting invalid packets as valid. As mentioned before, the attack is 
an “insider” attack where one of the senders is malicious. The malicious sender 
can assign two different vector spaces the same id and sign both using his secret 
key. An intermediate node has no hope of ever detecting this, since two packets 
constructed using these two vector spaces are both individually valid, but they 
are not pairwise valid, and can cause the receiver to incorrectly decode an honest 
user’s message. We make this formal below. 

We explain the attack with subspace dimension m = 1; the attack easily 
generalizes to arbitrary to. In our system, the honest sender is Alice, the receiver 
is Bob, and the malicious user is Mallet. 

Honest User Alice. Alice wishes to send a file described as a single nonzero 
vector Vi £ F”. She sets vi = (vy, 1), chooses a file identifier id Q and uses 
her secret key sk a to create a signature ly on the one-dimensional subspace 
Vi C F ™ +1 spanned by vi, with identifier id a . Then she transmits the packet 
Pi = (vr, id a ,n). 

Malicious User Mallet. Mallet receives Pl and does the following: 

1. Generate a key pair (sk M , pk ). 

2. Pick two vectors v 2 ,V 3 £ F™ such that the set {vy,v 2 ,v 3 } are linearly 
independent. Let V 2 , V 3 be the subspaces of Fp +1 spanned by v 2 = (v 2 ,1) 
and v 3 = (v 3 ,l), respectively. 

3. Choose an identifier id M 7 ^ id^, and use the key sk M to compute signatures 

t 2 ,T 3 on subspaces V 2 , V 3 with identifier id ;l . Create the packets P 2 = 
(v 2 , id M ,r 2 ), P3 = (v 3 , id M ,r 3 ). _ 

4. Run Merge on (vi, v 2 ) and (id a , id^) to obtain id = (id a , id M ) and vectors 
vi = (vi, 1, 0), v' 2 = (v 2 ,0,1). 
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5 . Run Combine((vi, id Q , n, 1 ), (v 2 , id AZ , r 2 , 1 )) to produce a signature r4 on 
the vector V4 = v( + V2 = (vi + v 2 ,1,1) £ F” +2 . Let P4 = (V4, id, T4). 

6. Send P3 and P4 to Bob. 

Receiver Bob. Bob receives P3 and P4, each of which pass the verification 
test (by the correctness of Sign and Combine). Bob then tries to decode the 
received data to recover Alice’s file. 

The identifier id = (id a , id M ) indicates that v* = V4 — (v3,0, 1 ) is a primi¬ 
tive vector sent by Alice, since the augmentation component of v* is ( 1 , 0 ). 
However, the data part of v* is vy + v 2 — V3, which cannot be in the sub¬ 
space spanned by vy since {vi,v 2 ,v 3 } are linearly independent. Thus v* is 
an invalid vector accepted by Bob. 

In the above attack, Mallet was able to frame Alice by secretly reusing id^ for 
two different vector spaces. Note that this attack is more insidious than simply 
inserting data with identifier id a , which would have the same effect of corrupt¬ 
ing Alice’s data. We see from this attack that arbitrary file identifiers provide 
a malicious insider too much power. It is thus necessary to tie the identifiers 
cryptographically to the files they represent, in a way that is verifiable at every 
node in the network. In particular, the Sign algorithm should output both an 
identifier id and a signature <7. To verify the identifier we use an algorithm Id Test 
that takes as input a public key pk, a vector y, and a list of identifiers id, and 
outputs “accept” if y is in the subspace V identified by id. To avoid the above 
attack, the following tasks must be infeasible for Mallet: 

1 . Given a public key pk a , find an identifier id a and a vector y such that 
ldTest(pk Q ,y, id Q ) outputs “accept.” (This is a type of “collision-resistance” 
property.) 

2 . Given a vector space V, a public key pk a , and (id a ,er) := Sign(sk a ,R) 
(where sk a is the secret key corresponding to pk Q ), find ay^F such that 
ldTest(pk Q , y, id Q ) outputs “accept.” (This property is unique to the network 
coding scenario.) 

If Mallet can succeed at either task, then Bob is convinced that the vector y 
belongs to a file sent by Alice, when in fact it does not. (Indeed, in the first case 
Alice didn’t even send a file!) 

These two tasks are quite familiar: they are analogous to the two ways of break¬ 
ing a single-source network coding signature scheme |JJ Section 3 . 1 ]. This analysis 
leads to our key observation: the file identifier produced by Sign must itself 
be a vector space signature. It follows that all the security properties of the 
system are carried in the identifier id, so we can set the “signature” part a equal 
to id or eliminate a entirely. We formalize these ideas in the following section. 

Remark 3. One can show that allowing the use of arbitrary file identifiers not 
only makes hop-by-hop containment impossible, but also forces the receiver to 
solve the clique problem for proper decoding. Specifically, there is a formal re¬ 
duction from the clique problem to decoding in multi-source network coding; 
details are in the full paper [2] . 
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4 Network Coding Signatures 

We formally define the multi-source network coding signature scheme. Here the 
Sign algorithm generates an element er that is used both as a signature and a 
file identifier. The Verify algorithm implements the functionality of the IdTest 
algorithm in the previous section and allows every node to validate the identi¬ 
fier/signature of an incoming packet. Since signatures and identifiers play the 
same role, the Combine algorithm provides the same functionality as the Merge 
algorithm of Section[2j while also keeping track of the public keys involved. Note 
that in contrast to traditional signatures, the Verify algorithm does not take as 
input the original message (i.e., vector space). 

Definition 4. A multi-source network coding signature scheme is a tuple of 
five PPT algorithms, Setup, KeyGen, Sign, Combine, Verify, with the following 
properties: 

Setup(l A , n, m ): On input the unary representation of a security parameter 1 A , a 
data space dimension n, and a subspace dimension to, outputs a description 
of system parameters params. This description includes the prime p used to 
define the field over which vector spaces are defined, as well as n and m. 

KeyGen(params): Outputs a randomly generated user key pair (sk, pk). 

Sign(params, sk, V ): On input a secret key sk and a subspace V C F” +m , outputs 
a signature er. 

Combine(params, (vi, o'!, pk 1; ai), (v 2 , <T 2 , pk 2 , 02 )): Takes as input two vectors 
vi £ and V2 £ Fp +m ^ 1 2 , two lists of signatures er^er 2 , two lists of 

public keys pk 1; pk 2 , and two coefficients ai, 02 £ F p . The algorithm outputs 
a list of signatures er and a list of public keys pk. 

Verify (params, pk,v,er): On input a list of public keys pk, a vector v £ F” +m ^, 
and a list of signatures er, outputs T (accept) or _L (reject). 


Correctness. We require that for any set of system parameters determined by 
Setup(l A , n, to), the following hold: 

1 . For primitive signatures: Consider a key pair (sk, pk) «— KeyGen(params) 
and a vector space V C F” +m . Let er be the output of Sign(params, sk, V). 
Let pk = {pk} and er = {er}. Then for all v £ V, we require that 
Verify(params, pk, v, er) = T. 

2 . Recursively, for combined signatures: Consider two lists of public keys 
pk 1; pk 2 , two vectors Vi, V2, two lists of signatures er 1; er 2 such that 

Verify(params, pk 1; vi, u\) = Verify(params, pk 2 , V2, er 2 ) = T. 

Let be the output of Merge(vi, v 2 , er^ er 2 ). For any ai,a 2 £ F p , 

we require that if er, pk is the output of the Combine algorithm on inputs 
(vi, er 1; pk l5 ai), (v 2 , er 2 , pk 2 ,a 2 ), then: 
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(a) cr' = cr, 

(b) For j in 1 ,...,/ = |er|, if the jth element of cr is the fcth element of er,; 
for i £ { 1 , 2 }, then the jth element of pk is the fcth element of pk,. 

(c) Verify(params, pk , + 02 v(,, cr) = T. 

In the second correctness condition, (a) tells us that identifiers and signature 
play the same role, while (b) requires that the list of public keys produced by 
Combine corresponds (in a natural way) to the list of identifiers produced by 

Merge. 

4.1 Security 

The security game captures the fact that if the system is secure, even an attacker 
who controls all sources but one and is given a chosen message oracle for the 
honest source cannot create an existential forgery on the honest source. The game 
between a challenger and an adversary A with respect to a signature scheme S 
proceeds as follows. 

Init. The challenger runs Setup(l A , n, to) to obtain system parameters params 
and runs KeyGen(params) to obtain sk* and pk*. It sends pk* and params to 
A. It keeps sk* to itself. 

Signature queries. A adaptively requests signatures for vector spaces 
Vi,...,Ve C F” +m . The challenger responds by computing Sign(params, 
sk*, Vi) for i = 1 ,... ,£ and sends the resulting signatures to A. 

Forgery attempt. A eventually outputs a 4-tuple (pk , v4 cr\ j, where pk^ 
is a list of / (not necessarily distinct) public keys pk^ = (pk 1? ..., pky) that 
contains the challenge public key pk*, is a nonzero vector in F” +m ^, erf 
is list of / signatures, and W' = spanjwi,..., w 4 } C F” +t for some t. 

Adjudication. Let erf = (cr 1 ,..., cr/) be the list of (distinct) identifiers output 
by A , where, w.l.o.g. we assume the first k components <Ti,...,cr*, are 
returned as the signatures for the chosen message queries Vj,..., 14 , k < £. 
Let cr w be the last / — k elements of cr 1 . Let V* be the vector space output 
by MergeSpaces(Vj,..., 14, W\ ay,..., cr*,, cr w ). 

The forger wins the game if Verify(params, pk , v^,<r^) = T and at least one 
of the following two conditions holds: 

1. There exists i in 1,..., / such that the itli component of pk^ is equal to 
pk*, but cr,; is not any of the signatures obtained in response to chosen 
message queries. 

2. For i = 1,..., t, we have Verify(params, pk w , w/, <r w ) = T, but ^ V*. 

Definition 5. The advantage NC-Adv[A,5] of A is defined to be the proba¬ 
bility that A wins the security game. A multi-source network coding scheme S 
is secure if for all probabilistic, polynomial-time adversaries A the advantage 
NC-Aclv[A,5] is negligible in the security parameter A. 



Preventing Pollution Attacks in Multi-source Network Coding 


171 


In the security game, the attacker requests signatures for files V\ ,...,14 and 
creates his own file W t. Intuitively, W' corresponds to the vector space (the set 
of files) whose data the adversary mixes with the honest user’s data in order 
to frame the honest user. Winning condition (1) implies that the attacker can 
create a valid fake signature for one of the files that he requests signatures for, 
i.e., for a file signed with sk*. Winning condition (2) implies that the attacker 
can produce a fake file W f whose basis vectors pass the verification test, and a 
vector v* that passes the verification test but lives outside the subspace V* that 
is the span of network coding combinations of the files he requested and created. 
A receiver that decodes the basis vectors of W f together with the vector W will 
be fooled into accepting a vector from the user with public key pk* that this 
user never sent. 

Implied properties. The security model implies that even given the secret key 
sk, no PPT adversary can construct distinct vector spaces Vi, V 2 £ F” +m such 
that Sign(params, sk, V\) = Sign(params, sk, V 2 ). Note, however, that this is no 
ordinary collision resistance property. During signature verification the vector 
space V is not available and therefore the Verify algorithm must validate the 
signature given only y € V. 

This collision resistance property is crucial during decoding. The decoder 
collects all incoming packets with a specific identifier into a full rank matrix 
and runs the decoding procedure. Collision resistance ensures that all packets 
with the same signature belong to the same vector space. 

To see that this collision resistance property follows from our definition, it is 
not difficult to give a generic attack that works on any scheme for which this 
property is not satisfied. The attack, in fact, is essentially the same as the attack 
presented in Section ITT! 

The vector space W'. Recall that the forgery attempt by the adversary con¬ 
sists of the 4-tuple (pk , vf, erf, where pk^ is a vector of public keys con¬ 
taining the challenge public key pk*. The other public keys in the vector pk^ are 
invented by the adversary and it is therefore possible that the adversary knows 
the corresponding private keys. 

The vector W and the signature erf are the adversary’s existential forgery. 
Suppose that (vf,erf) verify as a valid vector-signature pair with respect to 
pk^. We require the adversary to output the vector space Wt to prove that he is 
capable of exploiting vf to fool a recipient to incorrectly accept a vector from the 
single honest sender. To fool the recipient, the attacker can generate valid vector- 
signature pairs for all basis vectors of W' using the secret keys at his disposal. 
Since all these vectors have valid signatures, a recipient might try to decode 
the basis of W 1f along with the vector v'. If W ^ V*, after decoding this set of 
vectors (i.e. after subtracting from W the projection of vf onto W'), the recipient 
obtains a vector u that he believes came from the honest sender, but which the 
honest sender never sent since u is not in MergeSpaces(Ri,..., 14, ay,..., cr*,). 

Hence, if the attacker is capable of producing a forgery for which condition 
(2) of adjudication holds, then an adversary can fool a recipient by sending it 
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a sequence of properly signed vectors. We would like to require that for a se¬ 
cure signature scheme it should be impossible to produce a valid forgery where 
v t ^ V*. Unfortunately, this strong requirement appears to be unsatisfiable. 
We therefore weaken it to require that W (jL V* only when there is a possi¬ 
bility that the vectors in W i will be jointly decoded with W, namely when 
Verify(params, pk w , w„ cr w ) = T for all basis vectors wy of WW This is an ac¬ 
ceptable weakening of the security requirement since the decoder will never group 
together vectors with different identifiers. In Section[5]we show that the resulting 
definition is satisfiable. 

We note that requiring the adversary to output W' is analogous to the secu¬ 
rity model of aggregate signatures [5] where the attacker outputs an aggregate 
signature from s public keys, where s — 1 of them are invented by the attacker. 
Moreover, the attacker must output the list of s — 1 messages that went into the 
aggregate forgery, for each of the public keys the attacker invented. Our vector 
space W t plays the same role as the s — 1 messages in the aggregate forgery. 

5 Construction of a Multi-source Signature Scheme 

In this section, we construct an explicit multi-source network coding signature 
scheme satisfying Definition 01 In order to give a generic construction, we first 
define an auxiliary primitive called vector hash. This primitive captures the prop¬ 
erties of the homomorphic hashes used by Krohn et al. m that are necessary 
for secure signatures. 

5.1 Vector Hashes 

A vector hash consists of three algorithms, Setup, Hash, Test, with the following 
properties: 

HashSetup(C, n): Input: unary representation of a security parameter A and 
dimension of the data space n. Output: public parameters pp. 

Hash(pp,v): Input: public parameters pp and a vector v £ F™. Output: hash h 
of the vector v. We require that this algorithm be deterministic. 

Test(pp, y, /3, h): Input: Public parameters pp, a vector y £ F”, a vector of 
coefficients (3 £ F™ and a vector of in hash values h. Output: T (true) or T 
(false). 

Let h be a set of hashes of a basis Vi,..., v m of a vector space V. Intuitively, 
we want the Test algorithm to tell us whether y was constructed correctly from 
the basis, i.e., whether y = This means that Test should output T 

whenever y is constructed correctly, and it should be difficult for an adversary 
to find a vector y ^ V and a (3 such that Test outputs T. We now formalize 
these correctness and security conditions. 


Preventing Pollution Attacks in Multi-source Network Coding 


173 


Correctness. For correctness, we require the following for all public parameters 
pp <— HashSetup(l A ): 

1. For all v £ F”, if h <— Hash(pp,v) then we have Test(pp, v, 1, h) = T. 

2. Let v £ F”, let (3 £ ¥ e p for some £, and let h be a list of hashes of length 

t. Fix i £ {0,...,€}, let d' £ F ^" 1 " 1 be the vector (3 with a zero inserted 

between the <th and (i + l)th place, and let h' be the vector h with any 
hash value inserted between the ith and (i + l)th place. We require that if 
Test(pp, v, /3, h) = T, then Test(pp,v,/3 ,h') = T. 

3. Let Vi, v 2 £ Fy, let /3 1; (3 2 G F^ for some £, let h be a list of hashes of length 

l. Let a,b £ F p , let y = avi + &v 2 , and /3 = a/3 1 + b(3 2 ■ We require that if 

Test(pp, Vj,/3 i; h) = T for * = 1,2 then Test(pp,y,/3,h) = T. 


Security. Let VH = (HashSetup, Hash, Test) be a vector hash. Let A be a PPT 
algorithm that takes as input public parameters pp <— HashSetup(l A , n) and 
outputs a vector v* £ F”, an m-dimensional vector space V C F” (for some 
m) represented as basis vectors Vi,..., v m , an m-tuple of coefficients (3 , and a 
vector of hashes h = (h \,..., h m ). 

Definition 6 . With notation as above, we say that A breaks the vector hash 
scheme VH if v* ^ V, Test(pp, v*, (3, h) = T, and Test(pp, Vj> e*, hi) = T for 
i = 1,..., m. (Recall that v,; is the data component of Vj.) We define the ad¬ 
vantage Hash-Adv[A VH] of A to be the probability that A breaks VH. We 
say that a vector hash VH is secure if for all PPT algorithms A the advantage 
Hash-Adv[yl, VH] is negligible in the security parameter A. 

In the full paper [2] we give an example vector hash using a finite cyclic group 
G of order p. This vector hash is secure if the discrete logarithm problem is 
infeasible in G. 

5.2 The Construction 

For this construction, we use as a black box a vector hash as defined in 
Section nm 

Signature Scheme J\fS. Let VH = (HashSetup^, Hash/j, Test^,) be a vector 
hash and let S = (Setup s , KeyGen s , Sign s , Verify,.) be a signature scheme for sign¬ 
ing messages in {0,1}*. Our network coding signature scheme is as follows: 

Setup(l A , n, m): Run HashSetup /i (l A , n) to obtain hash parameters and 
Setup s (l A ) to obtain signature parameters. Let params contain to, n, and 
the outputs of these algorithms. 

KeyGen(params): Run KeyGen s to obtain public key pk and the private key sk. 
Output (pk,sk). 
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Sign(params, sk, {vi,..., v m }): For i = 1 set hi := Hash/,(params, v,;). 

Set h = (hi,... ,hm), r) := Sign s (sk, h), and cr := (h, 77 ). Output cr. 

Com bine (para ms, (vi, o\, pk x , ai), (V 2 , <x 2 , pk 2 , 02 )): 

1 . Let (t',Vi,v '2 := Merge(er 1; er 2 , vi, v 2 ). 

2. To create a list pk , do: 

For j in 1 ,,k = |<r|, if the jth element of cr is the fcth element of cr, 
for i £ {1, 2}, then the jth element of pk is the Arth element of pk 4 . 

3. Output cr' and pk . 

Verify(params, pk, y, cr): Interpret cr as a list of / signatures where each cr* = 
(hj, r]i). Write H = (hi,..., h/). Do the following: 

1. For i in 1compute Verify s (pk i , Ip, rji). 

2. Compute Test/,.(params,y, f3 H). (Recall /3 y is the augmentation com¬ 
ponent of y.) 

If all steps output T, output T; else output _L. 

The only difference between the Combine algorithm in our signature scheme and 
the Merge algorithm of Section f2.1l is that the Combine algorithm also keeps 
track of the public keys associated with the signatures. 

Instead of sending a separate hash signature r/i in each cr,;, we can aggregate 
these signatures together for space efficiency. In the full paper j2] we describe an 
instantiation of the system where a signature on / files is of length (fm+ 1 ) log 2 p 
bits. We also prove a lower bound showing that for large values of / and m this 
length is optimal. 

Correctness. We verify the correctness conditions of Definition 0] 

1. For primitive signatures: Consider a key pair (sk, pk) 4 — KeyGen(params) 
and a vector space V C ¥p +m described by a properly augmented basis 
vi,..., v m . Let a be the output of Sign(params, sk, V = {vi,..., v m }). In¬ 
terpret the signature cr as cr = (h,^). 

For primitive signatures, there is only one file f = 1. We examine each 
step of Verify in turn: 

1. Since 77 = Sign s (sk, h), we have Verify s (pk, h, 77 ) = T by correctness of S. 

2. Since hi = Hashh(params, v,), and f3 Vi is the unit vector e,; since we are 
using a properly augmented basis, correctness conditions (1) and (3) of 
VH imply that Test/,(params, v,, /3 Vi , h) = T. 

It follows that every basis vector v, passes the signature verification test, 
i.e., Verify(params, pk, Vi, cr) =T . 

2. Recursively, for combined signatures: Consider two lists of public keys 
pk x , pk 2 , two augmented vectors Vi,v 2 , two lists of signatures <ti,<t 2 such 
that 


Verify(params, pk l5 vi, cri) = Verify(params, pk 2 , v 2 , er 2 ) = T. (5.1) 

Let v'^v^er' be the output of Merge(vi, v 2 , <j\, cr 2 ) and / = \cr\. Let H.; be 
the list of all the hash elements in <Ji for i = 1,2. Let ai, a 2 G F p be network 
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combination coefficients, and let y = aiv^ + 02 v 2 . Let er, pk be the output 
of the Combine algorithm on inputs (vi, <J\, pk x , ai), (V 2 , er 2 , pk 2 , t^)- 

Conditions (a) and (b) are now immediate. For (c), we note that in our 
scheme, er) = (h j,r]j) for j — 1,..., /. Let H = (hi,..., h/). We examine 
each step of the Verify algorithm: 

1. By the assumption EH and the way we have set up the correspondence 
between indices of pk and er, we have Verify s (pk J , hj, rjj) = T for j in 
1 

2. By assumption (15.11) we know that Testh(params, vy, j3 v ., Hj) = T for 
i = 1,2. By correctness property (2) of VH, for i = 1,2 we have 
Test/ 1 (params, v(, /3 v i , H) = T. Then, by correctness property (3) of VH , 
we have Test/ l (params,y, /? y ,H) = T. 

Thus, we have that Verify(params, pk , y, er) = T. 

We have the following security theorem; the proof is in the full paper [2] . 

Theorem 7. The network coding signature scheme MS is secure assuming that 

VH is a secure vector hash, and assuming S is a secure signature scheme. 
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Abstract. Since their introduction in 2008, the non-interactive zero- 
knowledge (NIZK) and non-interactive witness indistinguishable (NIWI) 
proofs designed by Groth and Sahai have been used in numerous appli¬ 
cations. In this paper, we offer two contributions to the study of these 
proof systems. First, we identify and correct some errors, present in the 
oringal online manuscript, that occur in two of the three instantiations 
of the Groth-Sahai NIWI proofs for which the equation checked by the 
verifier is not valid for honest executions of the protocol. In particular, 
implementations of these proofs would not work correctly. We explain 
why, perhaps surprisingly, the NIZK proofs that are built from these 
NIWI proofs do not suffer from a similar problem. Secondly, we study 
the efficiency of existing instantiations and note that only one of the 
three instantiations has the potential of being practical. We therefore 
propose a natural extension of an existing assumption from symmetric 
pairings to asymmetric ones which in turn enables Groth-Sahai proofs 
based on new classes of efficient pairings. 

1 Introduction 

BACKGROUND. Interactive proofs allow a prover who possesses some witness u> 
to convince a verifier that a certain statement x £ L is true, where L is some lan¬ 
guage and w is a witness that attests to this fact. A particularly fascinating class 
of interactive proofs are those where the interaction does not reveal information 
about the witness, even if the verifier behaves maliciously. Two popular flavors 
of witness privacy are witness-indistinguishability (14] , when it is unfeasible for 
an adversary to decide which of the possible witnesses is used by the prover, and 
zero-knowledge (I9l20j . when it is possible to simulate the interaction between 
the prover and the verifier without access to a witness. The two notions share 
many commonalities, but are also different in important respects and suitable 
for different applications. For example, WI proofs can be executed in parallel 
while preserving the privacy of the witness, while ZK proofs may fail in this 
scenario. 


P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 177 2010. 
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A variant of zero-knowledge proofs useful in multiple application scenarios 
are the non-interactive ones [Hj (NIZK). In such proofs the interaction between 
the prover and the verifier is minimal: the prover simply sends the verifier a 
single message after which the latter verifies correctness of the proof without any 
further interaction with the prover. It is not difficult to see that NIZK proofs 
are impossible in the plain model |18| . so some additional setup assumptions 
are required. Originally, such proofs were constructed in a setting where parties 
share a common random string (CRS) [15] . Later, non-interactive protocols were 
also constructed by eliminating interaction through the use of random oracles [5]. 

Unsurprisingly, both zero-knowledge and witness-indistinguishable proofs have 
found countless applications in cryptography. The power and versatility of such 
proofs is based on general results that show how to construct zero-knowledge proof 
systems for any language in NP |2T| . For example, with zero-knowledge proofs, a 
party can prove that he/she is following a certain protocol, without revealing any 
information about its internal state, and thus can be used to compile protocols 
secure for honest-but-curious adversaries into protocols secure against arbitrary 
adversaries. Witness indistinguishable proofs can be used, for instance, in the Yao 
garbled-circuit protocol, to show that public commitments are commitments to 
elements in {0,1}. The usability of proofs is tightly tied to the class of languages 
to which they apply, and to the efficiency of the associated proof systems. Clearly, 
these two requirements are contradictory. Indeed, the approach of [21 ] is quite 
general, but the combination of general NP-reductions to problems along with ZK 
protocols leads to highly impractical protocols even for the simplest languages. 

A crucial step towards more efficient non-interactive zero-knowledge proofs 
was the breakthrough work of Groth and Sahai [25]. The authors show how 
to give NIWI and NIZK proofs for a large class of languages, without going 
through the use of a general NP reduction. Numerous cryptographic results 
use GS proofs to obtain efficient implementations for various primitives, see 
the related work section for a very partial list of such works. In this paper, 
we contribute to the understanding of these proofs in two different ways. We 
extend the range of implementations to new, potentially more efficient settings 
and we fix an inconspicuous flaw that affects an important part of the original 
online manuscript [251 . To explain our contributions, we recall some details of 
the settings used by [25] . 

In the original (conference) version of the Grotlr-Sahai paper [25], the authors 
give a general, abstract framework for the construction of NIWI/NIZK proofs 
based on cryptographic pairings. We note that none of the errors we identify 
occur in [25] . Proofs and details for three different instantiations are given in 
the e-print archive version of the paper [ 2B1 . The first instantiation uses pairings 
over groups of large composite order; the other two use pairings over prime order 
groups. The cryptographic assumptions on which the results rely are: the sub¬ 
group decisional problem [5] in the first case, the decisional linear assumption 
(DLIN) [7], and the symmetric external Difffe-Hellman assumption (SXDH) [T], 
for the remaining two instantiations, respectively. To obtain the later instantia¬ 
tions the authors essentially use a general procedure [TB] of converting protocols 
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from the subgroup decision setting for composite order pairing groups, into pro¬ 
tocols for the DLIN and SXDH assumptions in prime order pairing groups. 

Efficient implementations based on a new assumption. From a practical 
perspective, pairings for groups of composite order are likely to have little practi¬ 
cal impact, due to their inherent inefficiency. The same holds true for symmetric 
pairings, i.e. Type-1 pairings in the vocabulary of which are the pairings 
used in the second instantiation. Therefore, the only practical instantiation pro¬ 
posed in |22J remains the one based on SXDH in Type-3 curves. In this paper, 
we propose new GS proofs which can be used with the most efficient curves for 
pairing based cryptography. Our proposals are based on a natural extension of 
the DLIN assumption from the symmetric setting to the asymmetric one. We 
thus give DLIN-based GS proofs that work for all of the asymmetric pairing 
types. In particular, our proofs are the first GS proofs that work for Type-2 
pairings. 

We wish to warn readers against judging the efficiency of the proof systems 
based on Type-1 curves versus those based on Type-2 and Type-3 solely based 
on the number of group elements needed. The efficiency of the former curves is 
only illusory since the key sizes for these curves grow faster, and the benefits are 
immediately lost. Also, we note that the relative merits of the SXDH assumption 
versus the DLIN assumption are a matter of debate in pairing based cryptogra¬ 
phy; some people prefer the DLIN assumption as it applies to both symmetric 
and asymmetric settings, although the latter is never formally stated and we 
need to formalise the underlying hard problem in this paper. On the other hand 
the SXDH assumption only applies to Type 3 pairings, which produce the most 
efficient pairings known. The SXDH assumption also usually results in cleaner 
and simpler protocol, with Groth-Sahai proofs being no exception. In addition 
the SXDH assumption is more closely related to a long standing natural number 
theoretic problem, i.e. decisional Diffie-Hellman, than the DLIN assumption. 

Fixing the inconspicuous flaw. The construction of Groth-Sahai NIZK 
proofs in |25J26| is done in two stages. First, the authors show how to construct 
NIWI proofs, and then following a trick they turn these proofs into full zero- 
knowledge ones. Unfortunately, the NIWI proofs based on DLIN and SXDH 
presented in [26| are actually invalid: the verification equation is not always 
satisfied when the execution is between honest provers and verifiers. As such, 
these proofs do not apply for many rather simple but quite useful statements. The 
details are somewhat technical and we explain this point later in the paper. These 
errors were introduced during the translation from the construction based on the 
subgroup decisional problem to the DLIN and SXDH settings [27]. Interestingly, 
this problem does not affect the construction of NIZK proofs out of NIWI proofs, 
since in this case the verification equation is always satisfied! Again, we elaborate 
on this point later in the paper. 

We believe that the reason why this error had not been discovered so far is two¬ 
fold. On the one hand, as explained above, GS NIZK proofs are actually correct. 
On the other hand, when used in applications, GS (NIWI) proofs are usually 
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treated in a black-box way: the actual proofs are never spelled out, and the 
associated equations are never verified. Clearly, the problem would immediately 
show up in an implementation. We fix these problems by giving the correct 
versions of the proofs. 

Finally, we note that in an effort to encourage further study of the Groth- 
Sahai proofs we depart from the notation in the original paper and use some 
notation that we believe is more expressive and easier to follow. 

Related work. Despite their recent introduction, Groth-Sahai proofs have 
been widely used. Since Groth-Sahai proofs apply to bilinear groups, they are 
mainly used to design cryptographic primitives that do not rely on the random 
oracle assumption. The proofs are used to prove a knowledge of some secret 
witnesses or as a proof of membership. The scenarios in which the Groth-Sahai 
proofs are used in the literature include: proving the possession of some sig¬ 
nature without actually revealing the signature, proving that two ciphertexts 
encrypt the same message, etc. For instance, they were used by Camenisch et al. 
pH) to build an encryption scheme that is KDM-CCA2 secure. Also, the NIWI 
and NIZK proofs were used by Belenkiy et al. |2|3| to design p-signatures and 
anonymous credentials. Groth and LuJ2Tj used the NIZK proofs to prove the 
correctness of a shuffle. Huang et al. [35] used Groth-Sahai NIWI and NIZK 
proofs to construct optimistic fair exchange protocol. In [31], Phong et al. used 
the NIZK proofs to construct undeniable signatures. Belenkiy et al. in[3] have ex¬ 
tensively used both the NIWI and NIZK proofs to construct many cryptographic 
primitives such as p-signatures, verifiable random functions and compact e-cash 
system. Groth-Sahai proofs have also been used to construct group-signatures 
[23130] . In [13122] the proofs are used to design universally composable oblivious 
transfer protocols. The first of these is particularly interesting from our perspec¬ 
tive; in |T3| the authors use a NIWI proof to prove that set of linear equations 
holds. When this protocol is instantiated with the DLIN or SXDH protocols 
from [26] one would not obtain a proof which verifies. This is an example of an 
instance where the verification equations of the GS NIWI proofs are not valid. 

Many of the previous applications of Groth-Sahai proofs for prime order 
groups, are assumed to be in the (inefficient) symmetric pairing setting, as they 
wish to use protocols based on the DLIN assumption; or they are in the asym¬ 
metric setting and need to make a DLIN assumption related to their scheme and 
then an additional SXDH assumption to apply Groth-Sahai proofs. By extend¬ 
ing the DLIN setting to both Type-2 and Type-3 pairings we hope to simplify 
future applications of Groth-Sahai proofs, in addition by providing a mechanism 
for implementing Groth-Sahai proofs in the Type-2 setting other applications 
may open up. 

2 Bilinear Groups 

Bilinear groups are a set of three groups Gi,G 2 and Gt of prime order q along 
with a bilinear map (deterministic function) t which takes as input an element 
in Gi and an element in G 2 and outputs an element in G t- We shall write Gi 
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and G 2 additively, and G t multiplicatively, and write Gi = (Pi),G 2 = ( P 2 ), for 
two explicitly given generators Pi and P 2 . 

The function t must have the following three properties: 

1. Bilinearity: VQi sGi , Q 2 £ G 2 x, y € Z, we have 

i([x]Q u [y}Q 2 ) = i(Q 1 ,Q 2 r y . 

2. Non-Degeneracy: The value t(P\,P 2 ) generates G t- 

3. The function t is efficiently computable. 

In [T7] . pairings were categorized into three Types: 

Type-1 : This is the symmetric pairing setting in which Gi = G 2 . 

Type-2 : Here we have Gi 7 ^ G 2 , but there is an efficiently computable 
isomorphism ip : G 2 —» Gi where ip{P- 2 ) = Pl- 

Type-3 : Again Gi 7 ^ G 2 , but now there is no known efficiently computable 
isomorphism. 

In the Type-1 setting the decision Diffie-Hellman problem is easy in Gi, and 
hence in G 2 . In the Type-2 setting the decision Diffie-Hellman problem is easy 
in G 2 , but believed to be hard in Gi. In the Type-3 setting the decision Diffie- 
Hellman problem is believed to be hard in both Gi and G 2 . This last belief is 
often formalised as the symmetric external Diffie-Hellman assumption: 

Definition 1. Symmetric External Diffie-Hellman (SXDH) Assump¬ 
tion: In Type-3 pairings the Decisional Diffie-Hellman (DDH) problem is hard 
in both groups Gi and G 2 . 

As a note on naming, the “external” part relates to the fact we are talking 
about DDH in Gi and G 2 , as opposed to the pairing based BDDH problem. 
The “symmetric” part is related to the fact that we are talking about DDH 
being hard in both Gi and G 2 . It is perhaps unfortunate terminology that this 
symmetry only applies in the asymmetric pairing setting! 

As the SXDH problem only applies to Type-3 pairings, it is common to make 
the following assumption for Type-1 pairings, as a natural strengthening of the 
normal DDH assumption, which no longer applies in Type-1 pairings: 

Definition 2. Decisional Linear Problem (DLIN) Assumption: ForType- 
1 pairings with Gi = G 2 = G and P = Pi = P 2 , given the tuple ([a]P, [6]P, [ ra\P, 
[s6]P, [t\P) where a, b , r, s, f G F g are unknowns, it is hard to tell whether t = r + s 
or t is random. 

To extend this definition to the Type-2 or Type-3 setting one could insist that 
DLIN is hard in either Gi or G 2 , however we will require that it is hard in both 
Gi and G 2 . We call this latter notion, in following the naming of the SXDH 
assumption, as the symmetric DLIN (SDLIN) assumption. 

Definition 3. Symmetric Decisional Linear Problem (SDLIN) 
Assumption: SDLIN is said to hold if DLIN is hard in both Gi and G 2 . 
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This is a stronger form of a version of the asymmetric DLIN problem considered 
in other works such as [ 22 ] , where a single problem with some variable instances 
in Gi and some in G 2 is considered. 

We end this section by noting that in :|5], Boneh et al. showed that the exis¬ 
tence of the isomorphism in the Type-2 setting can affect the security of some 
cryptographic primitives. On the other hand, Chatterjee and Menezes m show 
that a protocol which is secure in Type-2 setting can almost always be transfered 
to one which is secure in Type-3 setting. 

3 Groth—Sahai Proofs 

In [25126] Groth and Sahai presented a way to construct efficient non-interactive 
witness-indistinguishable and zero-knowledge proofs for a wide variety of state¬ 
ments in the common reference string model. In this section, we recap on their 
notation, and point out the problems with their presentation. 

The NIZK proof systems allow the same methodology to be applied to four 
distinct types of equations, or three distinct types in the case of Type-1 pair¬ 
ings. In this section the four different types are presented in one go using the 
abstraction of Groth-Sahai. Later we present the specialisations to the different 
settings. 

Let q be the order of Gi, G 2 and G t as above. We first create F^-vector spaces 
Ai, A 2 , At, ®i, B 2 and B>x- In [26] these are Z„-modules and not F 9 -vector spaces 
since n may be composite, in our situations we always have n = q, a prime. We 
assume these vector spaces are equiped with bilinear maps / : Ai x A 2 —> A t 
and F : Bi x B 2 —> By. In addition, there are inclusion and projection maps for 
each pair, i.e. we have maps l\ : Ai —> Bi, t 2 : A 2 —> B 2 , It '■ A t —■► and 
Pi : Bi —> Ai, P 2 : B 2 —> A 2 , pt ■ —► A t- Note, that the l maps are required 

to be computable, but that the p maps will not be computable in general. The 
maps are extended to vectors of elements in a componentwise fashion. 

All these maps need to satisfy the following commutative properties: 

Vir e Ai, Vy € A 2 :F(n(x), i 2 (y)) = 

wxeM u wy gb 2 :f(pi(x),p 2 (y)) = p T (F(x,y)). 

The essential problem in the DLIN and SXDH settings from [2Bj is that the 
specific values of these maps, for three of the four equation types, do not result 
in the first of these commutative properties holding. In particular the given 
presentation of it is incorrect. This leads to the resulting verification of the 
NIWI proofs being invalid. 

The CRS we use in our proofs is a set of rhi and ?n 2 elements of Bi and 
B 2 , which we will denote by U^\ ... € Bi and U^\ ... , U€ B 2 . To 

commit to an element x £ Aj one picks r = (ri,..., r r - tli ) € F™ ; and computes 

commifi) = x) + Y\vj]U^ 

3 =1 

= ii(x) + r- Uj. 
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Now suppose we wish to produce a NIWI proof for the equation, 

a ® y + x<£) b + x (g) Py = t, (1) 

where we use the shorthand x y for f(x,y) 7 with an obvious extension to 
vectors. In the above equation; x £ A", y £ A™ are the secret witnesses, with 
a £ A™, b £ A%, r £ Mat raX m(F g ), and t £ At the known constants. 

We commit to x and y using the random values given by R £ Matnxm^lfg) 
and S £ Mat mX m 2 (F q ) via 

c = ti(x) + R U\_ and d = £ 2 ( 1 /) + S Lfo. 

The NIWI proof is then given by the following two vector values; one picks 
T £ Matm 2 xmi (F g ) uniformly at random and computes 

7 i = R t l 2 (b) + R T ri 2 (y) + R T rS U 2 - T t U 2 £ B™ 1 , 

9 = S T n{a) + S T r T n( x) + T Ui £ Bf 12 . 

Verification of the proof (jr, 9) is performed by checking whether 

ti(a) • d + c• 62 (h) + c• rd = Lr{t) +U\ • tt_ + 9_• U 2 

holds. Here we use X • y as a shorthand for F(X i y), again with an obvious 
extension for vectors. 

Notes. There are four possible instantiations of the equations: 

— Ai = Gi, A 2 = G 2 , f(P, Q) = t(P,Q ): This case is called pairing product 
equations. 

— Ai = Gi, A 2 = F g , f(P,y) = [y\P- This case is called multi-scalar multipli¬ 
cation in Gi. 

— Ai = Fg, A 2 = G 2 , /(x, Q) = [x)Q: This case is called multi-scalar multipli¬ 
cation in G 2 . 

— Ai = Fg, A 2 = Fg, f(x,y) = x ■ y: This case is called quadratic equation 
in Fg. 

In the DLIN and SXDH cases, the formulaes for lt for the last three types of 
equations are given incorrectly in [26]. ^From examining the above methods for 
NIWI proofs, we see that the NIWI proofs would not verify, unless the value t 
was the trivial element. 

We note that in the simpler, yet very common, setting of having T = 0 and 
either a = 0 or b = 0 in equation ©, the proofs can be simplified further by 
setting the random matrix T to be zero. 

The CRS, and hence the commitment scheme used to commit to elements in 
Ai and A 2 , comes in two flavours: either we have a binding key , or a hiding key. 

— Binding key: This setting requires that for i = 1 and * = 2, p,;( 6 ,;(x)) = x 
and i>,(U l '' ) = 0 for all j. Hence we have pi( comm;(x)) = x which gives 
us a perfectly binding, computationally hiding commitment scheme. When 
used in the proof, this results in perfectly-sound proofs with computational 
witness indistinguishablity. 
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— Hiding key: This setting requires that i.e. the set of 

commitment keys, generate the entire space IB,;, and hence we have i,;(A,;) C 
(u [ 1 ^,..., uj m '' ^). Therefore, if the randomness vector, r, is uniformly chosen, 
the commitment scheme is computationally binding and perfectly hiding. 
If this setting is used, the resulting proofs are computationally sound and 
perfectly witness-indistinguishable. 

The security of the whole system is ensured as long as the adversary is unable 
to distinguish between a hiding and a binding key. The security proofs can be 
found in . When producing a real system, one relies on a trusted third party 
to produce a binding key, however when producing a simulated proof etc. one 
relies on a hiding key, which essentially provides a trapdoor for the simulator in 
the CRS model. 

For the DLIN assumption in the Type-1 setting in [26], a method is given to 
make the map F symmetric, in the sense that F(X,y) = F(y , X). We shall see 
when F is instantiated below, that such a symmetry is not possible for Type-2 
and Type-3 pairings. When F is symmetric the associated proofs can be made 
much simpler, we leave the reader to consult [ 2 E| for details. 

To convert the above method for NIWI proofs into a method for NIZK proofs, 
we first reorganize the above equation as 

fO IfA T = F„ 

a ® y + (—1 ® t) + x® b + x® Ty = l O If A t = Gu or G 2 , 

^ 1 If Ay = <G t- 

The vector of commitments c is extended to include a commitment to the el¬ 
ement one, this is done to deal with the extra term in the left hand side of 
the above equation. Then the above NIWI method is applied. This results in 
the NIZK proofs in the pairing product equation subcase only applying when 
either t = 1 in equation Jl]), or one knows Pi,..., P n and Q 1 ,..., Q n such that 
t = t(Pi, Q 1 ) • • • t(P n , Qn), since only then can the above transform be applied. 
This is the only restriction in the method for obtaining NIZK proofs. 

In all cases, to obtain NIZK proofs we apply the method for NIWI proofs in 
the case where the equation is homogeneous, i.e. has a trivial right hand side. 
This latter point is crucial in understanding why the NIZK proofs from PS] work 
but the NIWI proofs do not. Hence, even though lt was presented incorrectly 
in [26] , since the method to produce NIZK proofs will result in a trivial value of 
lt, the method for NIZK is sound. 

4 Equations for l and p 

From the last section, it is seen that the whole system depends on the choice 
of the t and p maps, plus the CRS. The maps must be chosen so that they 
have the required commutativity property over / and F. In this section, we give 
such maps and the relevant CRS for the SXDH and SDLIN examples in the 
asymmetric pairing setting. 
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We present the data in the following way, for each setting we first present 
the hiding and binding CRS, along with the map F and the groups B, and By. 
Then we present the maps i; and pi for the cases Aj = F g and Aj = GAt this 
point we overload the symbols tj and p,;, with the precise maps being obtained 
by type-checking. This helps simplify our notation somewhat. 

Once the maps are defined we can proceed to produce the commitment schemes, 
and the NIWI and NIZK proofs. Then for the four types of equation being proved, 
we present the maps lt and px , which result in the maps being commutative. With 
these maps one can then verify the resulting NIWI proofs. Again we overload lt 
and pTi with the precise map being determined by type checking. 

4.1 SXDH-Based Proofs 

Setup. We set Bi = Gf, B 2 = G 2 and By = Gy, all with operations performed 
componentwise. We let 

j-, f Bi X B 2 -* By 

: \(A 1 ,y 1 ),(A 2 ,y 2 ) —► ( i(X 1 ,X 2 ), i(X u Y 2 ), i(Y u X 2 ), t{Y u Y 2 ) ) 

Since the underlying pairing t is bilinear, it follows that the map F is also bilinear. 
To generate the CRS, the trusted party generates, for i = 1,2, a,, A G F* at 
random and defines 


Qi - [oi]Pi, Ui = [ U]P.i , Vi = [ti]Qi- 

We now set 

u\ x) = (Pi,Qi) elj, 

u ( 2 ) = f =(U i ,Vi) Binding Case e B 

i 1 - (O, Pi) = ( Ui , Vi - Pi) Hiding Case 


The CRS is then the set {Ui,U 2 } where U\ = and U 2 = 

Under the SXDH assumption one cannot tell a binding key from a hiding key. 
To aid what follows, we first set W» = + (O , P,) = (Wjp, Wj )2 ) G B*. 

ti, pi and comnii. We now define the maps 4 : Aj —» Bj, pt : Bj —> Aj and the 
commitment scheme commj. There are two cases we need to consider; Aj = F 9 
and Aj = Gj. 

Ai = F 9 : We define, in this case, the maps via 

f F q —> Bj j Bj —♦ ¥ q 

il ' \ x 1—> [x\W t Pt -\X= ([ci]Pj, [c 2 ]Pj) 1—> c 2 - aid 

Note, that computing pi requires one to solve discrete logarithms. This is not an 
issue since we at no point will compute pi, we simply need to know it exists and 
it has the correct properties. 
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The commitment scheme commi is obtained as before, except we select rhi = 1, 
as opposed to rhi = 2, this simplifies the equations somewhat. Hence we have 

' F„ x F„ —> 


(x, r ) 


Ai = G i\ In this case we define 


ii : 


(0,X) 


Pi 


Li(x) + [r]U l 


X = {C 1 ,C 2 ) 


( 1 ) 


Gi 

c 2 - [ai]C x 


The commitment scheme comm; is obtained as in our main discussion, i.e. with 
TOj = 2. Hence we have 

' Gi x F 9 x F g —> Bj 

(X, n, r 2 ) ► ii(X) + N^ (1) + [r 2 ]wf } 

it and px- Here we have four cases, depending on which of the four types of 
equation we are dealing with 
Pairing Product Equations. 


lt : 


Gt 

c 


Pt ■ 


( 1 , 1 , 1,0 

Multi-Scalar Multiplication in Gi and G 2 
In both of these cases we have 


B 7 1 —* G T 

(Cl, 1, Cl,2, C2, 1X2,2) ' -* C2,2Cl,2 1 (C2,lCl,l 1 )’ 


PT ■ 


Gi 


(£ Sl £ S2 £ S3 ( Si ) 


- ais 2 — a 2 S3 + aia 2 si]Pi 


lt 


lt : 


where £ = t(Pi,Po). For multi-scalar multiplication in Gi the map lt is defined 

by 

'Gi 

X ► (1,1 ,i(X,W 2tl ),i(X,W 2>2 )) 

Whilst for multi-scalar multiplication in G 2 the map Lt is defined by 

G 2 ► B 7 1 

x > (i,t(Wi,i,x),M(Wi i2 ,x)). 

Note, these are different definitions from those given in (2B]. The above definitions 
produce the required commutative properties. 

Quadratic Equations in F,. 

In this case we have 

By -> Fg 

(C Sl ,C S 2 ,C S 3 ,C S4 ) 1 — > S 4 - «i S 2 - a 2 s 3 + aia 2 si 

where £ = t(Pi,P 2 ). The function lt is given by 

F„ —> Bj 1 

1 —♦ F{Wi,W 2 ) z . 

Again this is different from the map given in |2fl . 


Pt : 


l t (z) : 
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4.2 SDLIN-Based Proofs 


We now perform a similar analysis when we wish to base security on the SDLIN 
problem. Recall in [26] this situation is only described for the Type-1 pairing 
situation. What we describe below can be used in both the Type-2 and Type-3 
situations. In addition by specialising it to the Type-1 situation, and applying 
the optimization of jJB], to produce a symmetric version of F{X,y), one obtains 
more efficient NIZK proofs for Type-1 pairings as well. 

Setup. We set Bi = Gf, B 2 = G 2 and B t = G^, all with operations performed 
componentwise. We let 


81 x 


(X 1 ,Y 1 ,Z 1 ),(X 2 ,Y 2 ,Z 2 ) 


i(X u X 2 ) t{X x ,Y 2 ) t{X x ,Z 2 ) 
i{Y u X 2 ) i{Y u Y 2 ) t{Y u Z 2 ) 
i(Zi,X 2 ) i{Z u Y 2 ) t{Z u Z 2 ) 


Since the underlying pairing t is bilinear, it follows that the map F is also bilinear. 
To generate the CRS the trusted party generates, for i = 1,2 aj, n, s,, U £ F* at 
random and defines 

Ui = [ 1 n\Pi , K = [ti)Pi. 

We now set 

uP = (Ui,0,Pi) elj, 
up = (0,Vi,Pi) eB„ 


uP = 


[n]uP + [ Sl ]uP 

= {[ri]Ui, [si]Vi, [r» + Si]Pi) 
[ri]uP + [ Si \uP - {0,0, Pi) 

= {[ri]Ui, \si\Vi, [;Tj + Si — l]Pj) 


Binding Case 
Hiding Case 

(1) 7/(2) 7/(3)! 


The CRS is then the set {Ui,U 2 } where U\ = {U{ ,U\ ,U\ ; }, and U 2 = 

{uPmP mP}. Under the SDLIN assumption one cannot tell a binding key 

(3) 

from a hiding key. To aid notation in what follows, we first set Wj = U\ + 

{0,0,Pi) = (Wi,i,Wi i2 , W ii3 ) G Bj. 

Li, pi and comm;. We now define the maps tj : Aj —> Bj, pi : Bj —> Aj and the 
commitment scheme comm;. There are two cases; Aj = F g and Aj = G*. 

Ai = F 9 : We define the maps via 


F„ 


v]Wi 


Pi 


A’=([c 1 ]Pj,[c 2 ]P,[c 3 ]Pj) 


IF, 

C 3 — — j-c 2 


The commitment scheme comm; is obtained as before, except we select rhi = 2, 
as opposed to rhi = 3, this again simplifies the equations. Hence we have 


F, x F, x F, 
{x,r-!,r 2 ) 


Li{x) + [r\]uP + [r 2 )U- 


/( 2 ) 


comm, : 
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Aj = G^ : We define 

(Gi —» B, f Bj —» Gi 

li '' \ X —► (0,0,X) Px : { X = (C U C 2 , C 3 ) >—> C 3 - [i]Cr - [±]C 2 

The commitment scheme comm; is obtained as in our main discussion, i.e. with 
rhi = 3. Hence we have 


comm; 


f G^ X Fg X Fg X Fg -> Bj 

\ (X, r 1 ,r 2 , r 3 ) i—> t»(X) + [n]W f (1) + [r 2 ]^ (2) + [r 3 ]W i (3) 


lt and pt■ Here we have four cases, depending on which of the four types of 
equation we are dealing with 
Pairing Product Equations. 



Pt 



G T 


— l/a 2 —l/t2 
7i 7 2 73 


where ji = Ci} /ai C 2 J ftl C3,i- 

Multi-Scalar Multiplication in Gi and G 2 . 

In both of these cases we have 


e- c- 3 

r' 1 * J I ^ S 2,l ^ S 2,2 £«2,3 

I V c 53 ’ 1 c s3 ’ 2 c s3 ’ 3 


[*3 - ^ 


where C, = t(Pi,P 2 ) and 

1 1 

Si — S 3 ,i-Si j — —S 2 ,i- 

Oi 1 1 

For multi-scalar multiplication in Gi the map lt is defined by 
f Gi —+ B7 1 

1 ^ 

„ 1 „ 1 , 1 

t(X, W 2 ,1) t(X, W 2>2 ) t(X, w 2 , 3 ) J 



Whilst for multi-scalar multiplication in G 2 the map it is defined by 



g 2 — 

■» Bt 


fllt(W hl ,X) 

X h— 

* 11 i(W h 2 ,X) 


\iii(w h 3 ,x) 


When specialised to the symmetric case these are different definitions of it to 
those given in fIE\ . The above definitions produce the required commutative 
properties. 
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Quadratic Equations in F g . 
In this case we have 


Pt ■ 


{ 



where again we have £ = t(P\, P 2 ) and 


1 1 

Si — S3 t i -Si j — —S2,i- 

di 1 1 


The function lt is given by 



Again this is different from the mapping given in [26| . 

4.3 Combining SXDH and SDLIN 

We end this section by noting an extension which was pointed out to us by J. 
Groth [27] . If one wanted to work in Type-2 pairings and one wanted a more 
efficient instantiation one could implement a system using DDH in Gi and DLIN 
in G 2 . We do not expand on the details of this construction, but remark that 
this would imply that elements in Bi would consist of two elements in Gi and 
elements in B 2 would consist of three elements in G 2 , with B-r consisting of six 
elements in G t- This added efficiency is at the expense of having to assume DDH 
in Gi, which defeats the benefit which some people (although not the current 
authors) see behind the DLIN assumption based constructions in pairing based 
cryptography; namely that a single protocol description can apply in all main 
three pairing types. 

5 Performance Comparison 

In this section we compare the relative commitment sizes of the different instan¬ 
tiations, the resulting proof sizes can be deduced from these and show a similar 
relative comparison. ^From the size of the elements in the groups Bi and B 2 one 
can also easily estimate the relative computational performance figures, as group 
operations are essentially a quadratic function of the bit length. Before proceed¬ 
ing we also note that Groth-Sahai proofs will usually be used in the context of 
another protocol or scheme which is likely to dictate the exact pairing type one 
is using, hence the following comparison is only for illustrative purposes. 

To provide concrete numbers we assume a security level equivalent to 128- 
bits of symmetric key security. Using “standard” comparisons of different key 
sizes this equates to a minimum size of G t of 3072-bits and a minimum size of 
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elements in Gi of 256-bits. We let k denote the pairing embedding degree. For 
Type-1 pairings the value of k is bounded by two for elliptic curves defined over 
fields of large prime characteristic and by six for curves which are defined over 
fields of characteristic three. For Type-2 and Type-3 curves the “optimal” value 
of k at this security level is k = 12. A crucial observation is that for Type-3 
curves we have the ability to compress the elements in G 2 by a factor of six at 
this security level by using BN curves. 

We summarize the commitment sizes (in bits), i.e. the size of elements in Bi 
and B 2 , as well as the proof sizes (also in bits), in Table [TJ From the table it would 
appear that using the SDLIN setting as introduced in this paper gives no advan¬ 
tage. However, this overlooks the fact that the point of Groth-Sahai proofs is to 
use them in other protocols and schemes. These protocols and schemes may re¬ 
quire one to work in the Type-2 setting, or to base ones security on the SDLIN as¬ 
sumption. Thus in these situations it makes more sense to use Groth-Sahai proofs 
suited to the particular protocol. In addition some researchers prefer the SDLIN 
setting to the SXDH setting as they prefer not to use the “special” pairing setting 
of Type-3, where there is no computable isomorphism from G 2 to Gi. 

For the Type-1 setting we give two figures to represent the case of large prime 
characteristic and characteristic three. Note in all cases the size of a proof is 
equal to rhi elements of B 2 and m 2 elements of Bi, except in the case of Type-1 
pairings where due to the symmetric nature of the map F(X , y) one can simplify 
this to max(mi,m 2 ) elements of Bi = B 2 . 


Table 1. Summary of the different instantiations 


Pairing 

Type 

1 

2 

3 

3 

Hard 





Problems 

DLIN 

SDLIN 

SDLIN 

SXDH 

|Gi| 

1536/512 

256 

256 

256 

|G a | 

1536/512 

3072 

512 

512 

|Bi| 

3 • |Gi| = 4608/1536 

3- |Gi| = 768 

3- |Gi| = 768 

2 ■ |Gi| = 512 

|B a | 

3- G 2 = 4608/1536 

3- |G 2 | =9216 

3 ■ |G 2 | = 1536 

2- |G 2 | = 1024 

Pairing Product Equations 

(m 1 , 1712 ) 

(3,3) 

(3,3) 

(3,3) 

(2,2) 

Size 

13824/4608 

29952 

6912 

3072 

Multi-scalar multiplication in Gi 

(m i,m 2 ) 

(3,2) 

(3,2) 

(3,2) 

(2,1) 

Size 

13824/4608 

29184 

6144 

2560 

Multi-scalar multiplication in G 2 

(m 1 , 1712 ) 

(2,3) 

(2,3) 

(2,3) 

(L2) 

Size 

13824/4608 

20736 

5376 

2048 


Quadratic Equations 

in Fq 


(mi, m 2 ) 

(2,2) 

(2,2) 

(2,2) 

(LI) 

Size 

9216/3072 

19968 

4608 

1536 
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6 Summary 

We have extended the Groth-Sahai techniques to pairings in the Type-2 setting, 
and to using the DLIN assumption in the Type-3 setting. This required us to in¬ 
troduce a minor extension to the DLIN hardness assumption. In doing so we cor¬ 
rected a number of mistakes in the formulae presented in [26j . Using our formulae 
all valid NIWI proofs in both the DLIN and SXDH settings will now verify. 
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Abstract. When commitment schemes are used in complex environ¬ 
ments, e.g., the Internet, the issue of malleability appears, i.e., a concur¬ 
rent man-in-the-middle adversary might generate commitments to values 
related to ones committed to by honest players. In the plain model, the 
current best solution towards resolving this problem in a constant num¬ 
ber of rounds is the work of Ostrovsky, Persiano and Visconti (TCC’ 09). 

They constructed a constant-round commitment scheme that is concur¬ 
rent non-malleable with respect to both commitment and decommitment. 
However, the scheme is only computationally binding. For application 
scenarios where the security of receivers is of a great concern, computa¬ 
tional binding may not suffice. 

In this work, we follow the line of their work and give a construction 
of statistically binding commitment scheme which is concurrent non- 
malleable with respect to both commitment and decommitment. Our 
work can be seen as a complement of the work of Ostrovsky et al. in 
the plain model. Our construction relies on the existence of a family of 
pairs of claw-free permutations and only needs a constant number of 
communication rounds in the plain model. Our proof of security uses 
non-black-box techniques and satisfies the (most powerful) simulation- 
based definitions of non-malleability. 

Keywords: commitment schemes, statistically binding, non-malleability. 

1 Introduction 

A commitment scheme is a two-phase interactive protocol between two parties, 
the committer, who holds a value, and the receiver. It enables the committer 
to commit itself a value while keeping it secret from the receiver. Two basic 
properties of a commitment scheme are the hiding property (the receiver can 
not learn the committed value before the decommitment phase) and the binding 

* Work done while Ivan Visconti was visiting UCLA, USA. 




194 


Z. Cao, I. Visconti, and Z. Zhang 


property (the committer is bounded to one value after the commitment phase). In 
the literature, two fundamental types of commitment schemes, statistical hiding 
and statistical binding, are considered. 

It is well known that the basic properties of commitment schemes can not pre¬ 
vent “malleability” attacks mounted by a probabilistic polynomial-time (PPT) 
man-in-the-middle (MIM) adversary who has full control of the communication 
channel between the committer and the receiver. The concept of non-malleability 
was first introduced by Dolev et al. [dj to capture security concerns in such set¬ 
tings. Loosely speaking, a commitment scheme is non-malleable if one can not 
transform the commitment of a value into a commitment of a related value. This 
kind of non-malleability is called non-malleability with respect to commitment 
(NMc for short) jTj. This definition is based on the independence of the commit¬ 
ted messages played by the MIM adversary with respect to the ones played by 
the committer. The notion of non-malleability used by Di Crescenzo et al. [2] is 
called non-malleability with respect to decommitment or opening (NMd for short), 
i.e., the adversary can not construct a commitment from a given one, such that 
after having seen the opening of the original commitment, the adversary is able 
to correctly open his commitment with a related value. This definition requires 
that the success probability of a MIM adversary is maintained by a stand-alone 
simulator. Subsequent NMc definitions are modified in a similar way 0H5M0. 
Simulation-based definitions are much more useful when a commitment scheme 
is used as a building block in a larger protocol since the existence of a simulator 
heavily simplifies the task of proving the security of the larger protocol. 

Intuitively, it seems that NMc is stronger than NMd. However, this depends 
on the subtleties of the definitions. Indeed this does not necessarily always hold 
at least with respect to non-malleability definitions in |2lll3j . In a journal version 
of [33, the authors [5] presented a stringent definition of non-malleability w.r.t 
commitment in order to imply the notion of non-malleability w.r.t opening. 

Several previous results focused on designing statistically hiding commitment 
schemes which are NMd. Based on number-theoretic assumptions, NMd com¬ 
mitment schemes were designed in |10I3| assuming the existence of a common 
reference string (CRS) that is shared by the two players before the protocol 
execution. Thus, their schemes do not work in the plain model (i.e., without 
setup assumptions). Recently, Pass and Rosen [I I5| presented a slightly different 
definition of NMdjjJ They then constructed a commitment scheme under their 
NMd definition based on a family of collision-resistant hash functions in the plain 
model. Their scheme is round-efficient and needs only constant-round communi¬ 
cation. More recently, based on the work of cm Zhang et al. [T3] presented 
a non-malleable commitment scheme under the weakest assumption, i.e., the 
existence of one-way functions. 


1 More precisely, the NMd definitions in 121101 do not take into account possible a 
priori information the adversary might have about the commitment received in the 
left interaction, while the definitions in |3l4l5] do. The definitions in Elm do not 
provide the stand-alone simulator the value committed in the left interaction after 
the commitment phase is finished, while the definitions in 14151 do. 
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Before the work of it was commonly believed that NMd (compared with 
NMc) is the only notion that makes sense in a computationally binding commit¬ 
ment scheme [II. However, Ostrovsky et al. [8] argued that by slightly relaxing the 
NMc definitional NMc can also be achieved for computationally binding commit¬ 
ment schemes. They considered concurrent MIM attacks where the adversary can 
simultaneously participate in any polynomial number of executions as a receiver 
and as a committer. Based on the work of [6)7| . and using some techniques al¬ 
ready introduced in mm they gave a computationally hiding and computation¬ 
ally binding commitment scheme which is both concurrent NMc and concurrent 
NMd. In a full version of [8j, they [TB| further gave a construction of a constant- 
round statistically hiding commitment scheme which is concurrent NMd and that 
actually consists of a simplified protocol with respect to the one presented in [Hj. 
The above schemes assume the existence of a family of pairs of claw-free permu¬ 
tations, require constant number of communication rounds only and assume that 
commitment phase and decommitment phase do not overlap in time. 

For statistically binding commitment schemes, the first NMc one was de¬ 
signed by Dolev et al. jT] assuming the existence of one-way functions. However, 
the scheme requires 0(log n) rounds, where n is the security parameter. In the 
CRS model, Di Crescenzo et al. [TOj constructed very efficient NMc commitment 
schemes based on any public-key cryptosystem that is non-malleable under cho¬ 
sen plaintext attacks in addition to any shared-key cryptosystem that enjoys 
indistinguishability under plaintext oracle CCA-post attack. In the plain model, 
Pass and Rosen [415] first constructed a constant-round NMc commitment scheme 
assuming the existence of collision resistant hash functions. Pass and Rosen [6)7] 
then showed the NMc scheme of [415 1 is actually a concurrent NMc one under a 
stronger simulation-based definition |j The security proofs of mm requires a 
non-black-box use of the code of the adversary and moreover the one of |6I7| as¬ 
sumes that commitment phase and decommitment phase do not overlap in time. 
Lin et al. [T5| reconsidered the scheme of [T] and presented a concurrent NMc 
commitment scheme using only black-box techniques. Their scheme requires a 
polynomial number of communication rounds and is based on the minimal as¬ 
sumption, i.e., existence of one-way functions. In addition to the above results 
focusing on NMc, the only one that explicitly claimed NMd commitment schemes 
was designed in [!5j (see Sec. 3) in the CRS model. 

Before the clarification of [8] , another folklore belief about a statistically bind¬ 
ing commitment scheme is that if it is NMc then it is NMd. However, at least 

2 The values committed to by the adversary in a MIM execution are uniquely defined 
for all algorithms in the NMc definition m, but only for PPT algorithms in the 
relaxed definition. More recently, the NMc definition formulated in [5] can also be 
applied to computationally binding commitment scheme. 

3 The NMd definition in Ed is stronger than that in [2]. The former is a 
indistinguishability-based definition, i.e., there exists a PPT stand-alone simulator 
that commits to a value which is computationally indistinguishable from the value 
committed to by the MIM adversary. The latter is a relation-based definition, i.e., the 
stand-alone simulator is less likely to commit to a value satisfying any polynomial¬ 
time computable relation than the value committed to by the MIM adversary. 
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this can not be deduced just from the simulation-based definitions in mm 
in the plain model 0 The main problem is that the success probability of the 
stand-alone simulator is required to be only negligible close to the success prob¬ 
ability of the MIM adversary [5]. Recall in the NMc proof mm, a stand-alone 
simulator will internally simulate the left interaction for the MIM adversary by 
committing to a bogus value 0™. It seems that this simulator can not handle the 
NMd proof, because after receiving a committed value m, the simulator is stuck 
to open the bogus commitment to m. 

Therefore, achieving simultaneously concurrent NMc and NMd in a constant 
number of rounds and under the simulation-based notions, the work of [5] achieves 
the strongest security for commitment schemes in the plain model. However, the 
scheme is only computationally binding. When the security of receivers is of a great 
concern in some application scenarios, it may not be sufficient. Thus, there remains 
an open problem as to whether or not constant-round statistically binding com¬ 
mitment scheme that is both concurrent NMc and concurrent NMd exists in the 
plain model, under the stronger simulation-based definition |6I5I6I7II8] . 

1.1 Our Contribution 

We solve the above problem by presenting a round-efficient protocol for con¬ 
current non-malleable statistically binding commitment scheme. We show the 
following theorem. 

Theorem 1. Suppose that there exists a family of pairs of claw-free permu¬ 
tations. Then there exists a constant-round statistically binding commitment 
scheme that is both concurrent NMc and concurrent NMd. 

On a high level view, the commitment phase of our scheme is almost identical 
with that in [5]. The technique used in this phase is also the same. More pre¬ 
cisely, in addition to the technique used by mm , the two-witness technique of 
Feige m is also employed. Our contribution lies in the modification of the open 
phase in order to simultaneously achieve concurrent NMd and statistical bind¬ 
ing property. We borrow the idea of |18j in designing concurrent zero-knowledge 
proofs , i.e., we let the committer guess the private values committed to by the 
receiver in the commitment phase, and then use a witness-indistinguishable proof 
system to prove a carefully designed statement. In this way, the scheme is guar¬ 
anteed to prevent any unbounded adversary from opening the commitment in 
two different ways. 

Our work can be viewed as a complement of the work of [8] . Both of the work 
resolve the non-malleability issues against concurrent man-in-the-middle attacks 
and achieve the same-level of security in the plain model. The main difference 
between the two results lies in that the work of j8] focuses mainly on computa¬ 
tionally binding commitment schemes, whereas our work considers statistically 
binding ones. Compared with the work of |6I7| . our work also achieves both con¬ 
current NMc and concurrent NMd, whereas they only achieve concurrent NMc. 

4 There is no problem in the CRS model. The reader is refereed to [8j for more details. 
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We emphasize here that our scheme inherits the limitation from 070, i.e., the 
non-malleability proof heavily relies on the assumption that commitment phase 
and the decommitment phase do not overlap in time. 

2 Preliminaries 

We assume the reader is familiar with witness-indistinguishable protocols, zero- 
knowledge protocols and commitment schemes. For more details, the reader is 
refereed to [T5] for references. 

2.1 Concurrent Non-Malleable Commitments and Decommitments 

Next, we formulate the definitions of concurrent NMc and concurrent NMd. As 
stated in |6}7j we formalize the notion of non-malleability by a comparison be¬ 
tween a man-in-the-middle execution and a simulated execution. Let (C, R) be 
a commitment scheme. Let ti 6 N be a security parameter. 

The man-in-the-middle execution. In the MIM execution, the adversary A is 
simultaneously participating in m(n) = poly(n) left and m(n) right interactions 
(WLOG, the number of commitments is the same in the left and right execu¬ 
tion). In the i th left interaction, A interacts with the committer C to receive 
a commitment to a value ry. In the i th right interaction, A interacts with the 
receiver R and tries to commit to a value v l of its choice. After the execution 
of the commitments in all interactions, A executes the decommitments with 
C and the decommitments with R. Prior to the interaction, the value vector 
V = (ui,..., v m ) is given to C as local inputs. A also receives an auxiliary input 
z , which might contain a priori information about V. 

Let the random variable mim^ m (V, z) denote the values v±,... ,v m to which 
the adversary has committed in the right interactions. If the I th right commit¬ 
ment fails, or its transcript (commitment phase) equals to the transcript of any 
left interaction, the value Vi is set to _L. 

Similarly, we let the random variable m i m^ en (V, z) denote the values v\ ,... ,v m 
to which the adversary has opened in the right interactions. If the i th right 
commitment or decommitment fails, or its transcript (both commitment phase 
and decommitment phase) equals to the transcript of any left interaction, the 
value Vi is set to A. 

The simulated execution. In the simulated execution, a simulator S directly in¬ 
teracts with an honest receiver R in m{n) interactions. As in the MIM execution, 
the value vector V = (iq,... ,v m ) is chosen prior to the interaction, and S re¬ 
ceives some a prior information about V as part of its auxiliary input z. S first 
executes the commitment phases with R. Once all the commitment phases have 
been completed, S receives the value vector V and attempts to decommit to 
values v \,..., v m . 

Let the random variable sim c s om (V, z) denote the values vi,... ,v m committed 
to by S. The value v, is set to A if 5 fails in the I th commitment phase. Let the 
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random variable simf pen (V, z) denote the values opened by S. The 

value Vi is set to _L if S fails in the i th commitment phase or decommitment 
phase. 

Definition 1 (Concurrent Non-Malleable Commitment w.r.t Commit¬ 
ment [Eli]). A commitment scheme ( C , R) is said, to be concurrent non-malleable 
with respect to commitment if for every PPT man-in-the-middle adversary A that 
participates in at most m(n) left and m(n) right interactions, there exists a PPT 
simulator S such that the following two ensembles are computationally indistin¬ 
guishable: 

- { mim rarn( V ^)}v=( 1 ; 1 ,..., 1 ; m )e{ 0 ,l}»*-,ze{ 0 ,l}* 

Definition 2 (Concurrent Non-Malleable Commitment w.r.t Decom¬ 
mitment). A commitment scheme ( C , R) is said to be concurrent non-malleable 
with respect to decommitment if for every PPT man-in-the-middle adversary A 
that participates in at most m{n) left, and m{n ) right interactions, there exists 
an expected PPT simulator S such that the following two ensembles are compu¬ 
tationally indistinguishable: 

- {mim^ en (V, 2)}v=(^ li ... ) „ m )6{o, 1 } r »* m ,ze{o, 1 }* 

_ { sim fpen(^> 2)}v=(«i,...,u m )e{0,l}"* m ,zG{0,l}* 

A commitment scheme that is non-malleable according to Definition [23 is lib¬ 
eral non-malleable rather than strict non-malleable |1)3) . Note we follow [4)5)8] 
in that non-malleability is guaranteed only if the commitment phase and the 
decommitment phase do not overlap in time. 

Strong signature schemes. A signature scheme SS = (Sgen, Ssig, Sver) is said 
to be strongly unforgeable under adaptive chosen-message attack if no efficient 
adversary, with access to signature oracle with respect to the verification key VK, 
can output a valid message/signature pair (m, a) with non-negligible probability. 
Here “valid” means that Sver(VK, m, cr) = 1 and (m, cr) does not correspond to 
any message/signature pair that was output by the signature oracle. A strong 
signature scheme is a signature scheme that is strongly unforgeable. 

3 Constant-Round Statistically Binding Concurrent NMc 
and Concurrent NMd 

In this section, we present a constant-round statistically binding commitment 
scheme that is concurrent NMc and concurrent NMd. Denote by SBCom the 
statistically binding commitment scheme from any one-way function |2fl] . De¬ 
note by SHCom the statistically hiding commitment scheme from any collec¬ 
tion of claw-free permutation with an efficiently-recognizable index set f2T\ . 
Denote by {(Ptag, V ta g)}tag the constant-round tag-based perfect non-malleable 
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zero-knowledge argument of knowledge (NMZKAOK) for NP |4J5] . Denote by 
(swiP, swiV) the constant-round statistically witness-indistinguishable argument 
of knowledge (WIAOK) for NP [22l23i FI Let (cwfP,cwiV) be a constant-round 
computationally witness-indistinguishable proof of knowledge (WIPOK) for NP. 
Let SS = (Sgen, Ssig, Sver) be a strong signature scheme. The commitment 
scheme is shown in Fig. |TJ Note that all the tools used above can be achieved 
assuming the existence of a family of pairs of claw-free permutations. 

Our commitment scheme is a statistically binding variant of the one in [8j. 
The commitment phase is almost identical with that of the commitment scheme 
in |2>j with the following exception: in Stage 2, the receiver R uses a statistically 
hiding commitment scheme SHCom instead of a statistically binding one. It also 
invokes the statistical WIAOK (swiP,swiV) instead of a computational WIPOK. 
Roughly, in Stage 1, the committer generates a commitment c to v and proves 
knowledge of opening of c. In Stage 2, the receiver generates two commitments 
co, ci to two secretes vq, v\ respectively and proves knowledge of either secret. In 
Stage 3, the committer generates a signature to the transcripts up to now and 
the receiver then verifies the correctness of the signature. 

The decommitment phase is more involved and needs more careful design. The 
main difficulty lies in simultaneously achieving concurrent NMd and statistical 
binding properties. We are inspired by the work of [IS] on concurrent zero- 
knowledge proofs. We modify the scheme in [8] by letting the committer guess 
the private values committed to in the commitment phase and then use a WIPOK 
to prove a carefully designed OR statements. The construction employs the two- 
witness technique by Feige m and the well known FLS-technique [24] . Roughly, 
in Stage 1, the committer first generates a commitment d to a dummy value 
0 ra . After receiving d, the receiver then opens the values vo,Vi committed to in 
the commitment phase and proves knowledge of opening of either commitment 
Co or ci. In Stage 2', the committer sends the committed value v and runs a 
computational WIPOK to prove the statements that either c is a commitment 
to v , or d is a commitment to vq or v\. In Stage 3', the committer proves that 
c is a commitment to v, or it knows opening of q,* to for some b* £ {0,1}. 
In Stage 4', the committer generates a signature to the transcripts up to now 
and the receiver then verifies the correctness of the signature. Note that the 
FLS-technique is used both in Stage 2' and Stage 3'. 

Theorem 2. Suppose that SBCom is a statistically binding commitment scheme, 
SHCom is a statistically hiding commitment scheme and SS = (Sgen, Ssig, Sver) 
is a strong signature scheme. Suppose that {('Ptag, Vt a g)}tag is an one-many con¬ 
current perfect NMZKAOK for NP, (swi"P,swiV) is a statistical WIAOK for NP 
and (cwiP, cwiV) is a computational WIPOK for NP. Then ( C , R) is a statistically 
binding commitment scheme that is both concurrent NMc and concurrent NMd. 

5 Blum’s basic protocol for Hamiltonicity [233 is on ly computational zero-knowledge 
with soundness error 4. Moreover, the protocol includes three rounds of interac¬ 
tion. By running the basic protocol polynomial times in parallel, we get a computa¬ 
tional WIPOK for Hamiltonicity. If the prover uses a statistically hiding commitment 
scheme m in the first round, then we get a statistical WIAOK for Hamiltonicity. 
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Protocol (C, R) 

Security Parameter: l n 

String to be committed: v G {0, l} n 

Commitment Phase: 

Stage 1: 

C —> R : Let (pk, sk) G- Sgen(l n ). Pick uniformly r G {0, l} 11 and compute 
c <— SBCom(v; r). Send pk, c. 

C R : C uses witness (v,r) and proves using {7-’pk,V p k) (with tag pk) the 
statement that there exist values v, r G {0, l} n such that c = SBCom(u; r). 

R : Abort if the above proof fails. 

Stage 2: 

R —S- C : Pick uniformly vo,ro,vi,ri G {0, l} n and compute Co = 
SHCom(uo;ro),ci = SHCom(m;ri). Sendco,ci. 

R —> C : Pick a random bit b G {0, 1}. R uses witness (vb,rb) and proves 
using (swfP, swiV) that there exist values v*,r* G {0,1}” such that Co = 
SHCom(u*; r*) or ci = SHCom(u*; r*). 

C : Abort if the above proof fails. 

Stage 3: 

C — 5- R : Let tro be the transcript of the above interaction. Compute cro -S— 
Ssig(sk,tro) and send cro- 

R : Verify that Sver(pk, tro, rro) = 1. 

Decommitment Phase: 

Stage 1': 

C — 5- R : Pick uniformly r' G {0,1}". Compute c' = SBCom(0”; r') and send 

d. 

R^C: Senduo,fi. 

R O C : R uses witness rj, and proves using {swfP,swiV) (with tag pk) 
the statement that there exists a value r* G {0, 1}” such that Co = 
SHCom(uo; r*) or ci = SHCom(ui; r*). 

C : Abort if the above proof fails. 

Stage 2': 

C —» R : Send v. 

C O R : C uses witness r and proves using (cwiT^cwiV) the OR of the fol¬ 
lowing statements 

1. 3 r G {0,1}" s.t c = SBCom(y; r), 

2. 3 b* G {0, 1}, r* G {0,1}" s.t c' = SBCom(vf,»; r*). 

R: Abort if the above proof fails. 

Stage 3': 

C R : C uses witness r and proves using (Ppk, V p k) (with tag pk) the state¬ 
ment that either there exists r G {0, l} n such that c = SBCom(v; r), or 
there exist b* G {0, l},r* G {0, l} n such that Cb* = SHCom(ub«; r*). 

R : Abort if the above proof fails. 

Stage 4': 

C —^ R : Let tri be the transcript of the above interaction. Compute cri •(— 
Ssig(sk,tri) and send <J\. 

R : Verify that Sver(pk, tri, cri) = 1. 

Fig. 1. Concurrent non-malleable statistically binding commitment scheme (C , R) 
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Proof. We need to prove the scheme satisfies the following three properties: com¬ 
putational hiding, statistical binding, and concurrent NMc and concurrent NMd. 

Computational hiding. Intuitively, the hiding property follows from the hiding 
property of SBCom and perfect zero-knowledge property of (Ptagi Vtag)- Suppose, 
on the contrary, there exists an adversary R* that violates the hiding property of 
(C, R). Then we design an efficient adversary R that breaks the hiding property 
of SBCom. R' proceeds as follows. On input a challenge com (i.e., a commitment 
to mo or mi) from the committer of SBCom, R' internally incorporates R* and 
forwards the external commitment com to R* in Stage 1. All other executions are 
emulated by R' by following the honest committer strategy except that R' runs 
the simulator for (Ptag, Hag) in Stage 1. Finally, R' outputs whatever R* outputs. 
From the perfect zero-knowledge property of (Ptag> Vtag), if R distinguishes the 
commitment made using ( C,R ), then R' distinguishes the commitment made 
using SBCom. 

Statistical binding. The proof of binding property is more subtle. We show that 
any malicious adversary C* can not violate the binding property of (C,R). In¬ 
tuitively, if C* can open the commitment in two different ways, then due to 
the soundness property of (cwiP,cwiV) and the statistical binding property of 
SBCom, C* must use a fake witness in the execution of (cwiP,cwiV) in Stage 
2', i.e., it knows the witness to the statement that c' is a commitment to Vq or 
Vi. Note that the only place that C* might learn i>o or i>i before Stage 1' is in 
Stage 2 of the commitment phase. Since both the commitments co,ci are sta¬ 
tistically hiding and (swiP,swiV) is statistical Wl, C* learns Vq or v\ only with 
negligible probability in Stage 2. Thus, C* makes a commitment c! to the value 
Vo or vi only with negligible probability in Stage V . Moreover, C* commits using 
SBCom. So the second statement proved in Stage 2' is a false statement (even 
to an unbounded machine). According to the property of (cwiP, cwiV), even 
an unbounded C* can not successfully execute the proof with non-negligible 
probability in Stage 2'. This reaches a contradiction. 

More in details, assume for contradiction that there exists some adversary 
(not necessarily PPT) A that is able to violate the binding property of ( C,R ). 
We show how to construct an algorithm (not necessarily PPT) A' that violates 
the binding property of SBCom or the hiding property of SHCom or the Wl prop¬ 
erty of (swiP, swiV). A' interacts with A and follows honest receiver strategy. 
Once the decommitment phase is finished, A! runs the extractor of (cwiP, cwiV) 
in Stage 2'. According to the property of the extractor of (cwiP, cwiV), with 
overwhelming probability, A' gets a witness w. Then it must be the case that 
(1) «i = rs.tc = SBCom(i>; r). (2) w = r* s.t c' = SBCom(u;,; r*). (3) w = r* s.t 
d = SBCom(ui_f>; r*). 

We now show however that case 1 happens with negligible probability only. As¬ 
sume by contradiction, that it can happen with non-negligible probability. Then 
we can design an algorithm B that breaks the binding property of SBCom. B 
proceeds exactly as A!. B then succeeds extracting r such that c = SBCom(i>; r). 
Next, we let B keep rewinding A to the beginning of the decommitment phase 
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until case 1 happens again. B again extracts a witness and we denote by r* the ex¬ 
tracted witness. Let v* be the opened value by A. Now we get c = SBCom^*; r*). 
According to the assumption of A, v ^ v* with non-negligible probability. Now 
we find a commitment c that can be opened in two different ways. Thus, we 
break the binding property of SBCom. 

Next we show case 2 happens with negligible probability. Suppose on the con¬ 
trary, with some non-negligible probability it happens that d = SBCom(uf,; r*). 
We can design an algorithm B that breaks the Wl property of (swiP, swiV). B 
then internally executes all the interactions with A and proceeds exactly as A' 
with the only exception that the proof of (swi"P, swiV) in Stage 2 is generated by 
relaying all the messages with an external prover {B submits opening informa¬ 
tion of Co and c\ to the external prover. The external prover then proves using 
a witness for Cf,» for some b* £ {0,1}). We emphasize here that B generates the 
proof of Stage 2' itself. Then B successfully simulates the interactions with A in 
the decommitment, and B runs the extractor of (cwiP, cwiV). By looking at the 
extracted witness, B will guess the witness used by the external prover. 

Finally, we show case 3 happens with negligible probability. Suppose on the 
contrary, with non-negligible probability it happens that d = SBCom(?;i_b; r*). 
We then design an algorithm B that breaks the hiding property of SHCom. On 
input a challenge commitment c* (to value Co, i>i), B has to decide which value 
corresponds to c*. B proceeds exactly as A' with the following two exceptions. 
The first exception lies in the handling of interaction in Stage 2 of the commit¬ 
ment. Here B first picks a random bit 6 £ {0,1}, a random string Vb £ {0,l} n 
and a uniform random string rb £ {0, l} ra . B then computes Cb = SHCom(T 7 ,; 77 ,) 
and sets Ci_b = c*. Next B continues the execution of Stage 2 of commitment 
by following the honest prover strategy of (swiP,swiV) using ( 17 ,, 77 ,) as witness. 
The second exception lies in the handling of interaction in Stage V of decommit¬ 
ment. Here B randomly chooses a bit b* £ {0,1}, sends 17 ,, v \= £ 7 ,* to A and 
then uses witness 77 , to complete the proof (swfP,swiV). Finally, if the witness 
r* extracted satisfies d = SBCom^i-e,; r*), B then outputs v\-i otherwise, B 
outputs tv for randomly chosen b' £ {0,1}. Therefore the probability that B 
breaks the hiding property of SHCom is also non-negligible. 

Concurrent non-malleability. We need to show that the scheme is concurrent N Me 
and concurrent NMd. The proof of concurrent NMc is almost identical with that 
of the proof in |5]. Note NMc only concerns the commitment phase and as we dis¬ 
cussed previously, the commitment phase of our scheme only deviates from that 
of [8] when R sends commitments and plays the WIAOK in Stage 2. Indeed, in our 
scheme we use statistical versions of these tools while the protocol of [5] only needs 
the computational versions. The proof however goes through precisely as the one 
of [Hj- We omit the details here and defer the proof in the full version. 

Next, we show it is concurrent NMd. We show that for every PPT man-in-the- 
middle adversary A that participates in m(n) left commitments and m{n) right 
commitments, there exists an expected PPT simulator S such that for every 
PPT distinguisher D and every negligible function fi, for every value vector 
V = (ui,..., v m ) where Vi £ {0,1}" and every z £ {0,1}*, it holds that 
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|Pr[I>(mim;£ en (V, z)) = 1] - Pr[D(sim 0 s pen (V, z)) = 1] | < /*(«)■ W 

Denote by Adec the state of A after the commitment phase, i.e., Adec contains 
M’s description along with its configuration at that time just before the decom¬ 
mitment phase starts. 

We proceed by giving the description of the simulator S. S on input z and 
security parameter 1™ interacts with external honest receivers and runs the ad¬ 
versary A internally. During the commitment phases, on a high level, S internally 
incorporates A and emulates the commitment phases of all left interactions for 
adversary A by honestly committing to 0”, while internally emulating the right 
interactions as honest receivers. After all the commitment phases end, S invokes 
the extractors for all the proofs provided by A in the left and right commitments 
to extract all the corresponding witnesses. More precisely, for each right commit¬ 
ment, S runs the extractor of ('Ptag)Vtag) an< l we denote by ( Vi,fi ) the witness 
extracted in the i th right commitment. For each left commitment, S runs the 
extractor of (swfP,swiV) to get witness (vb it i, D> 4 ,i) (h £ {0,1}). Next, S plays 
the commitment phases with external receivers. S follows the honest committer 
strategy and commits to v t in the i th commitment phase. 

Once all the commitment phases are finished, S receives a value vector V = 
(i>i,..., v m ) and has to perform the decommitment phases internally with Mdec- 
S follows the honest receiver strategy in all right decommitments. The simulation 
of the i th left decommitment is as follows. In Stage F, S acts identically as an 
honest committer with the exception that S commits to Vb it i instead of 0" (using 
randomness rf f). In Stage 2', S follows the honest committer strategy with the 
exception that it uses the “fake” witness rf i to open the commitment to i y. In 
Stage 3', S uses the fake witness Vb it i to complete the proof. S follows the honest 
committer strategy in Stage 4'. Finally, for each i, if Mdec has successfully com¬ 
pleted the I th right decommitment, then S completes the decommitment phase of 
the external execution with honest receivers by opening the commitment to V{. 

Running time of S. From the construction of S, we know that S performs 
at most 2m extraction procedures in the commitment phases. Note that the 
running time of the extractions in both ('Ptag,Vtag) anc ^ (swi"P,swiV) are all 
expected PPT. Since the extractions are executed sequentially, the running time 
of all extractions is also expected PPT. Furthermore, the MIM adversary A is a 
PPT algorithm and therefore invoking a copy of A also requires PPT. Thus, S 
runs in expected PPT in the commitment phases. In the decommitment phases, 
since S runs in a straight-line manner and no rewinding is involved, the running 
time of S is strict PPT. Finally, we conclude that the overall running time of S 
is expected PPT. 

Next, we prove that the distribution of the messages opened by A when in¬ 
teracting with honest committers and honest receivers is indistinguishable from 
the distribution of the messages opened by A when interacting with S. 

Indistinguishability of the simulation. We first consider the case when there 
is only one left commitment and ?n(n) right commitments. Towards of showing 
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Equation ([T| (note V contains only a value v), we define a sequence of hybrid 
experiments {HYBj(t;, z)}i<i < 7 that receive v and z as auxiliary inputs. The 
output of each experiment is the output of a PPT distinguisher D on input 
a value v and a vector of values V whose i th element is defined as follows. If 
the i th right decommitment completes successfully and its transcript is different 
from the left interaction, then Vi is the value opened in the i th right interaction. 
Otherwise, Vi is set to _L. Let pi = Pr[HYBj(t), z ) = 1]. 

HYBi(v, z) proceeds exactly as S except that in Stage 1 of the left commitment 
phase, it runs the simulator of (P tag , Hag)- Since the simulation is perfect we 
conclude that pi = Pr[Z?(simf pen (r), 2 )) = 1]. 

HYB 2 ('y,z) proceeds exactly as HYBi except that in the left commitment 
phase, instead of feeding A a commitment to 0 n in Stage 1, HYB 2 feeds A 
a commitment to v using SBCom. Since both HYBi and HYB 2 are efficiently 
computable, that \p\ — P 2 I is negligible follows directly from the computational 
hiding property of SBCom. 

HYB 3 (?;,,z) proceeds exactly as HYB 2 except that it runs the simulator of 
(’PtagjVtag) in Stage 3' of the left decommitment. It follows from the perfect 
zero-knowledge property of (P tag , V tag ) that p 3 = p 2 . 

HYB^?;, z ) differs from HYB 3 in that it uses the real witness (i.e., decommit¬ 
ment of c) to complete the proof in Stage 2' of the left decommitment. It follows 
from the computational Wl property of (cwiP,cwiV) that |p 4 — p 3 | is negligible. 

HYB. 5 (ri, z) differs from HYB 4 in that it commits to 0 n in Stage V of the left 
decommitment. It follows from the computational hiding property of SBCom 
that |p 5 — P 4 I is negligible. 

HYB 6 (r’, z) proceeds exactly as HYB 5 except that it uses the real witness (i.e., 
decommitment of c) to complete the proof in Stage 3' of the left decommitment. 
It follows from the perfect zero-knowledge property of (Ptag, Vtag) that pe = p$. 

HYB 7 (v, z) proceeds exactly as HYBg except that it does not need to run 
the extractor of (swiP,swiV) in Stage 2 of left commitment phase. Since the 
extraction fails with negligible probability we have that |p 7 — p 61 is negligible. 

Note that HYB 7 differs from the real game in that it runs the simulator- 
extractor of (Ptag, Vtag) in Stage 1 of the commitment phase. Following the 
description of HYB 7 we know that it opens to external receivers the values it 
extracts from A at the end of the commitment phase in the simulated game. 
Moreover, in the real game the adversary A can not open its commitments in a 
different way (cf. Claim[T]). Thus, A opens to external receivers the values it com¬ 
mits to in the commitment phase. It follows from the simulation-extractability 
property of (Ptag, Vtag) that the simulation is perfect and the extraction fails 
with negligible probability in each right commitment where the tag is different 
from that in the left decommitment (when tags are the same, the security of 
signature scheme is violated). Therefore, except with negligible probability, we 
have that HYB 7 opens to external receivers the same values opened by A in the 
real game, i.e., we have that |p 7 — Pr[P(mim^ )en (t;, z)) = 1]| is negligible. 

Finally we conclude Equation ©• This completes the proof of Theorem [2J 
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Claim 1. In the real game A can not open in a different way. 

Proof. Assume that, with some probability p, there exists i £ [m] such that 
the value v' opened in the i th right decommitment is different from the value v 
committed in the * th right commitment @ We denote by co,ci the commitments 
made by R in Stage 2 of the i th right commitment. Denote by b the bit such that 
R uses the decommitment information corresponding to the commitment Cb to 
complete the proof of (swiP, swiV) in Stage 2 and Stage 1' of * th right interaction. 
If A successfully completes the i th right decommitment, then we consider the 
following experiment B. B on inputs i,v and z interacts with A and works 
as follows. We let B commit to v in the left commitment phase. Furthermore, 
B follows the honest committer strategy in the left interaction and the honest 
receiver strategy in all right interactions, except in Stage Y of i th decommitment 
B sends Vb and a randomly chosen v^_ b (With overwhelming probability will 
be different from the v\ -b chosen in the commitment phase. This is important 
for the proof of Case 3 below.). Once the i th right decommitment is over, B 
runs the extractor of (Hag, Hag) in Stage 3'. Note that the views of A in a real 
execution and an execution of B are identical. So the probability that A opens 
in a different way in the execution of B is also p. According to the property of 
the extractor of (Hag, Hag), with probability p' = p — e(n) where e is a negligible 
function, B gets a witness w, and one of the following three cases must happen: 

1. w = f s.t c = SBCom(u / ; f).( c is the commitment generated by A in the i th 
right commitment.) 

2. w = r* s.t Cb = SHCom(ub; r*). (vb, v{_ b are the values opened by B in Stage 
Y of i th right decommitment.) 

3. w = r* s.t ci_b = SHCom(u)‘_ b ; r*). 

Let p' = Pi + P 2 + P 3 , where pi is the probability that Case i happens (for 
i = 1,2,3). By the statistical binding property of SBCom, we have that v' 
must correspond to v and thus in this case A does not open in a different way, 
therefore pi must be negligible. Recall in the i th right commitment, B already 
generates commitments Co and ci to two different values vq and v\ respectively 
and B uses knowledge of a decommitment of Vb for a random bit b in both 
(swiP, swiV) of Stage 2 and (swiP, swiV) of Stage Y. By the statistical hiding 
property of SHCom and statistical Wl property of (swi V, swiV), we have that p 2 
is essentially identical to P 3 , and therefore both p 2 and P 3 roughly correspond 
to pY2 — e(n) where e is a negligible function. 

We can now conclude the proof showing that P 3 must be negligible, and thus 
summing up p is negligible as well, therefore A can not open in a different way 
with non-negligible probability. Indeed, notice that when Case 3 happens, we 
have that A committed to some value v\_ b in ci_b without having never used 
any opening of ci_b in the two executions of (swiP, swiV). Now in addition to the 
opening in the commitment phase (note B generates the commitment itself), we 

6 The committed value is the one uniquely specified by the statistically binding com¬ 
mitment scheme SBCom. 
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get two openings for SHCom. Therefore being SHCom computationally binding, 
this happens with negligible probability. 

Extending to many-many concurrent NMd. Next, we present the proof sketch for 
the many-many concurrent case. We show that the two ensembles {mim^ en (¥, z)} 
and {sirriQ pen (V, z)} are computationally indistinguishable. Suppose, for contra¬ 
diction, this is not the case. That is, there exists a PPT distinguisher D and a 
polynomial p(n) such that for infinitely many n £ N, there exists a value vec¬ 
tor V = (ui,... ,v m ), z € {0,1}* such that D distinguishes mim^ )en (¥, z) and 
sim^pgn (V, z) with probability at least For a generic n for which this hap¬ 
pens. We design a sequence of hybrid experiments {HYB,(¥, z)}o<i< m where 
HYBj(¥, z) is defined as follows. HYB, proceeds as S except that it emulates 
the i th left commitment phase by committing to iy, if j < i, and 0" other¬ 
wise. Moreover, it emulates the i th left decommitment by using a legal wit¬ 
ness to Vi, if j < i, and a false witness otherwise. It directly follows that 
sim op Y en m (V,2) = mim^ pen (¥,z) and sim^°(¥, 2 ) = simf pen (¥, z). By a stan¬ 
dard hybrid argument there exists an i £ [m\ such that 

|Pr[D(sim 0 H p Y e ^i(V, *)) = 1 ] - Pr[U(sim 0 H p Y e ^ (¥, z)) = 1] I > —^ (2) 

I v H I to • p(n) 

Note that the only difference between experiment HYBj_i(¥, z) and HYB, (¥, z ) 
is that in the former A receives a commitment to ty and its corresponding de- 
commitment generated using a valid witness in the interaction, whereas in 
the latter it receives a commitment to 0 ™ and its corresponding decommitment 
generated using a false witness. 

Then we design an efficient MIM adversary A that breaks the one-many con¬ 
current non-malleability of (C, R). A on auxiliary inputs z' = (n, i,Y, z) proceeds 
as follows. A internally incorporates A(z) and emulates the left and right inter¬ 
actions for A. A relays the messages in all right interactions between A and 
external receivers. In the i th left interaction, A relays either messages between 
an external committer and A, or messages between the simulator of the one- 
many concurrent case and A. For j £ [to] and j ^ i, A internally emulates the 
i th left commitment phase for A by committing to ty, if j < i, and 0™ other¬ 
wise. Moreover, A emulates the i th left decommitment for A by using a valid 
witness, if j < i, and a false witness otherwise. By construction, it follows that 
sim open(V, 2 ) = Sim^- 1 ^') and mim^ pen (¥,Y) = sirr£ Y ®-(¥). 

Therefore, A breaks the one-many concurrent non-malleability of ( C,R). 

4 Concluding Remarks 

Our result on top of previous work shows that there exist constant-round com¬ 
mitment schemes that are secure also against very powerful adversaries, as long 
as there is a barrier in time between commitment and decommitment phase. 
An interesting open question concerns the possibility of achieving commitment 
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schemes that remain secure even without such a barrier. The question is inter¬ 
esting even without requiring a constant round complexity^ 
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Abstract. This paper describes an extremely efficient squaring opera¬ 
tion in the so-called ‘cyclotomic subgroup’ of F* e , for q = 1 mod 6. Our 
result arises from considering the Weil restriction of scalars of this group 
from F 9 6 to F 9 2 , and provides efficiency improvements for both pairing- 
based and torus-based cryptographic protocols. In particular we argue 
that such fields are ideally suited for the latter when the field charac¬ 
teristic satisfies p = 1 (mod 6), and since torus-based techniques can be 
applied to the former, we present a compelling argument for the adop¬ 
tion of a single approach to efficient field arithmetic for pairing-based 
cryptography. 

Keywords: Pairing-based cryptography, torus-based cryptography, 
finite field arithmetic. 


1 Introduction 

Pairing-based cryptography has provoked a wealth of research activity since 
the first cryptographically constructive application of pairings was proposed by 
Joux in 2000 pH] . Since then, numerous further applications of pairings have 
been proposed and their place in the modern cryptographers’ toolkit is now 
well established. As a result, much research activity has focused on algorithmic, 
arithmetic and implementation issues in the computation of pairings themselves, 
in order to ensure the viability of such systems [311212118] . 

In practise, pairings are typically instantiated using an elliptic or a hyperel- 
liptic curve over a finite field, via the Weil or Tate pairing (see [B]) - or a variant 
of the latter such as the ate [TB], or R-ate pairing [23 ■ These pairings map pairs 
of points on such curves to elements of a subgroup of the multiplicative group 
of an extension field, which is contained in the so-called cyclotomic subgroup. 

Properties of the cyclotomic subgroup can be exploited to obtain faster arith¬ 
metic or more compact representations than are possible for general elements of 
the extension field. Cryptosystems such as LUC fMj and XTR [j2B], and the ob¬ 
servations of Stam and Lenstra [3!] and Granger, Page and Stam [IB], all exploit 
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membership of this subgroup to achieve fast exponentiation. Many pairing-based 
protocols require exponentiation in the cyclotomic subgroup, as does the ‘hard’ 
part of the final exponentiation of a pairing computation, and so these ideas can 
naturally be applied in this context mm- 

Currently there is a huge range of parametrisation options and algorithmic 
choices to be made when implementing pairings, and in order to facilitate a 
simple and unified approach to the construction of extension fields used in pair¬ 
ings, in 2005 Koblitz and Menezes introduced the concept of Pairing-Friendly 
Fields (PFFs) [23]. These are extension fields F p fc with p = 1 (mod 12) and 
k = 2 a 3 , with a > 1 and b > 0. Such specialisation enables algorithms and 
implementations to be highly optimised. Indeed for ordinary elliptic curves the 
2008 IEEE ‘Draft Standard for Identity-based Public-key Cryptography using 
Pairings’ (P1636.3/D1) deals exclusively with fields of this form [19] . 

In 2006 Granger, Page and Smart proposed a method for fast squaring in 
the cyclotomic subgroup of PFFs [Hj. However even for degree six extensions 
the method was almost 50% slower than the Stam-Lenstra result [33] ; the latter 
however does not permit the use of the highly efficient sextic twists available to 
the former, and so is not practical in this context. Both of these methods rely 
on taking the Weil restriction of scalars of the equation that defines membership 
of the cyclotomic subgroup, in order to obtain a variety over F p . The defining 
equations of this variety are then exploited to improve squaring efficiency. Rather 
than descend to the base field F p , in this paper we show that descending to only 
a cubic subfield enables one to square with the same efficiency as Stam-Lenstra 
for degree six extensions, and for between 60% and 75% the cost of the next best 
method for the cryptographically interesting extension degrees 12,18 and 24. 

In tandem with the results of [5] which show that PFFs are not always the 
most efficient field constructions for pairing-based cryptography, we present a 
compelling argument for the adoption of a single approach to efficient field arith¬ 
metic for pairing-based cryptography, based on the use of fields of the form F * 6 , 
for q = 1 mod 6. While these fields intersect with those listed in m - lending 
strong support to their possible standardisation - since these recommendations 
can be improved upon and since in the latest draft of this standard the recom¬ 
mended security parameters section is empty [2D] , we believe that our proposed 
fields should now be given serious consideration for inclusion. 

The sequel is organised as follows. In (j2]we describe our field construction and 
in [J3] present our fast squaring formulae. Then in JI] we compare our approach 
with previous results, and in f|5] and (fD] apply our result to pairing-based and 
torus-based cryptography respectively. We conclude in [J7] 

2 Pairing, Towering and Squaring-Friendly Fields 

Pairing-friendly fields were introduced to allow the easy construction of, and ef¬ 
ficient arithmetic within extension fields relevant to pairing-based cryptography 
(PBC), and are very closely related to Optimal Extension Fields [T]. In particular 
we have the following result from [23]: 
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Theorem 1. Let F p ic be a PFF, and let /3 be an element of F p that is neither a 
square nor a cube in F p . Then the polynomial X k — f3 is irreducible over F p . 

Observe that for ‘small’ /3, reduction modulo X k — (3 can be implemented very 
efficiently. Observe also that the form of the extension degree is important for 
applications. When 6 | k the presence of sextic twists for elliptic curves with 
discriminant D = 3 allows for very efficient pairing computation, while for 4 | k 
one can use the slightly less efficient quartic twists. Such extensions also permit 
the use of compression methods based on taking traces or utilising the 

rationality of algebraic tori [30b Furthermore F p ic may be constructed as a se¬ 
quence of Kummer extensions, by successively adjoining the square or cube root 
of /3, then the square or cube root of that, as appropriate, until the full extension 
is reached. 

As shown in [3], the condition p = 1 (mod 12) is somewhat spurious in that 
PFFs do not always yield the most efficient extension towers, and does not allow 
for families of pairing-friendly curves that have since been discovered [22], For 
the Barreto-Naehrig curves for example [2], which have embedding degree twelve, 
p = 3 mod 4 is preferred since one can use the highly efficient quadratic subfield 
F P 2 = F p [cc]/( x 2 + 1). To allow for the inclusion of such fields, Benger and Scott 
introduced the following concept [5]: 

Definition 1. A Towering-Friendly Field (TFF) is afield of the form F q m for 
which all prime divisors of m also divide q — 1. 

As with PFFs, TFFs allow a given tower of field extensions to be constructed via 
successive root extractions, but importantly stipulate less exclusive congruency 
conditions on the base field cardinality. For example, as above for BN-curves with 
p = 3 (mod 4), the extension F p i 2 is not a PFF, whereas the degree six extension 
ofF p 2 is towering-friendly, since p 2 — l = 0 (mod 6), cf. 95.21 This definition thus 
captures those considerations relevant to pairing-based cryptography (PBC). We 
refer the reader to [5] for details of the construction of efficient TFFs. 

All of the fields for PBC that follow shall be TFFs of special extension degree 
k = 2“3 , with a,b > 1, i.e., with 6 | k. Should it not cause confusion, we also 
refer to any field of the form F g e for which q = 1 mod 6 as a Squaring-Friendly 
Field (SFF), a name whose aptness will become clear in ([31 Thus all SFFs are 
TFFs and all TFFs used for PBC in this paper are SFFs. 


3 New Fast Squaring in the Cyclotomic Subgroup 

In this section we derive efficient squaring formulae for elements of the cyclotomic 
subgroup of TFFs, when the extension degree is of the form k = 2 a 3 b , with 
a, b > 1, i.e., for SFFs. This is the subgroup of F* fc of order where <Lk is 

the fc-th cyclotomic polynomial, which for 6 | k is always of the form: 

2 - 2 a - 1 3 b ~ 1 _ x 2 a - 1 3 b ~ 1 _|_ x 


<T 2 a Z b(x) = x‘ 
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We denote the cyclotomic subgroup by G<j fc ( p ), the membership of which can be 
defined as follows: 

= {« e Fp* I = 1}. (1) 

The condition on a in (JTJ) defines a variety V over F p f=. For d \ k let F p d C F p fc. 
We write Res F k / ¥ d V for the Weil restriction of scalars of V from F p jt to F p d. 
Then Res F k / F d V is a variety defined over F p d for which we have a morphism 

V ■ Res Fpfc/Fpd R -» V 

defined over F p fe that induces an isomorphism 

n ■ (Res Fi)fc /Fpd R) (F p d) -> V(F pfc ). 

We refer the reader to Section 1.3 of [37] for more on the restriction of scalars. 

While not stated explicitly, all prior results for fast squaring in G$ fc ( p ) exploit 
the form of the Weil restriction of this variety to a subfield. Stam and Lenstra 
restrict G<g e ( p ) from F p e to F p and G<g 2 ( p ) from F p 2 to F p 1311 . and similarly 
Granger et al. restrict G<g fc ( p ) from F p t to F p |T5] . 

Observe that <P 2 a 3 b(x) = <Pe( cc 2 ° l3b *) and so we have the following simplifi¬ 
cation: 

G $k(p) = G <P e (p k/6 )- 

We therefore need only consider G<£ 6 ( g ) where q = p k / 6 . Observe also that (q) \ 
<£ 2 (<7 3 ) and so 

G^6 (q) C G^ 2 (q3). 

Hence one can alternatively employ the simplest non-trivial restriction of G^ 6 ( 9 ), 
or rather of G$ 2 t q 3), from F g a to F ? 3, as in [33]. This reduces the cost of squar¬ 
ing in G<g 2 ( 9 3), and hence in G<p 6 ( 9 ), from two F g 3-multiplications to two F ? 3- 
squarings, as we shall see in SED 

Our simple idea is to use the next non-trivial Weil restriction of G^ 6 ( g ), which 
is from F g e to F (; 2 . This rather fortuitously provides the fastest squaring for¬ 
mulae yet discovered for the cyclotomic subgroups of SFFs, making an even 
greater efficiency gain than the Stam-Lenstra formulae for G<p 2 ( q 3) (cf. Table [l]), 
while providing a systematic and more general framework than the more ad-hoc 
method of jSlj. Restrictions to other subfields for higher extension degrees of 
interest do not seem to yield better results, however we leave this as an open 
problem. 

3.1 Fast Squaring in Res F 2 / F q G& 2 ( q ) 

Let F g 2 = F q [x]/{x 2 — i) with i a quadratic non-residue in F g , and consider the 
square of a generic element a = a + bx: 

a 2 = (a+xb) 2 = a 2 +2abx+b 2 x 2 = a 2 +ib 2 +2abx = (a+ib)(a+b)—ab(l+i)+2abx. 

This operation can be performed at the cost of two F 9 -multiplications, and a 
few additions. 
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If however a £ G$ 2 ( q ), we have a q+1 = 1, or a q ■ a = 1. Observe that: 

a q = (a + xb) q = a + bx q = a + bx 2( ' q ~ 1 ^ 2 ■ x = a + foiO- 1 )/ 2 ■ x = a — bx, 

since i is a quadratic non-residue. Hence the variety defined by the cyclotomic 
subgroup membership equation m is ( a + xb)(a — xb) = 1, or a 2 — x 2 b 2 = 1, 
or a 2 — ib 2 = 1. Note that this results in just one equation over F g , rather than 
two. Substituting from this equation into the squaring formula, one obtains 

a 2 = (a + xb) 2 = 2 a 2 — 1 + [(a + b) 2 — a 2 — (a 2 — \)/i]x, 

where now the main cost of computing this is just two F g -squarings. Observe 
that if i is ‘small’ (for example if i = — 1 for p = 3 (mod 4) when F g = F p ), then 
the above simplifies considerably. 

3.2 Fast Squaring in Res F 6 / F 2 G<£ e (q) 

Let F ? e = F g [z]/(,s 6 — i ), with ieF, a sextic non-residue. The standard repre¬ 
sentation for a general element of this extension is 

a = ao + ot\Z + CX 2 Z 2 + CK3Z 3 + aqz 4 + a^z 5 . 

However, in order to make the subfield structure explicit, we write elements of 
F g e in two possible ways, each of which will be convenient depending on the 
context: firstly as a compositum of F g 2 and F g 3, and secondly as cubic extension 
of a quadratic extension. 

F q e as a compositum. Let 

a = (a 0 + a\y) + ( b 0 + b\y)x + (co + c\y)x 2 = a+bx + cx 2 , (2) 

where F g 2 = F q [y\/{y 2 — i) with y = z 3 , and F g 3 = F g [a;]/(a; 3 — i ) with x = z 2 . 
Note that a,b,c £ F g 2 . One can therefore regard this extension as the composi¬ 
tum of the stated degree two and degree three extensions of F g : 

F g 6 = Fq(z) = F q3 (y) = Fq 2 (x), 

with the isomorphisms as given above. Viewing a in the latter form its square 
is simply: 

a 2 = (a + bx + cx 2 ) 2 = a 2 + 2 abx + (2a c + b 2 )x 2 + 2 bcx 3 + c 2 x 4 

= (a 2 + 2 ibc) + (2 ab + ic 2 ) x + (2 ac + b 2 )x 2 = A + Bx + Cx 2 (3) 

As before we use the characterising equation (JT]) for membership of G^ 6 ( 9 ), which 
in this case is a q ~ q+1 = 1. To Weil restrict to F g 2 , we first calculate how the 
Frobenius automorphism acts on our chosen basis. Firstly, since i is a quadratic 
non-residue, we have 
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Hence a q = (a o + a\y) q = a o — aiy, which for simplicity we write as a, and 
similarly for b q and c q . Furthermore, since i is a cubic non-residue we have 

x q = a; 3 *- 9-1 -*/ 3 • x = i^- 1 )/ 3 ■ x = u>x, 

where u is a primitive cube root of unity in F 9 . Applying the Frobenius again 
gives x q = uj 2 x. Note that the above computations necessitate q = 1 (mod 6), 
which is satisfied thanks to the definition of SFFs. 

The cyclotomic subgroup membership equation, rewritten as a 9 ■ a = a q is 
therefore: 


(a + buj 2 x + cuj 4 x 2 )(a + bx + cx 2 ) = a + btux + cio 2 x 2 , 

which upon expanding, reducing modulo x 3 — i, and modulo ^ 3 ( 0 ;) = co 2 + u> + 1 , 
becomes 

(a 2 — a — bci ) + w(*c 2 — 6 — ab)x + tu 2 (b 2 — c — ac)x 2 = 0. (4) 

This equation defines the variety Resp 6 /c 2 G<f 6 ( g ), as each F ? 2 coefficient of x 1 
equals zero. Solving for 6 c, ab, ac, one obtains: 

be = ( a 2 — a)/i 
ab = ic 2 — b 
ac = b 2 — c 

Substituting these into the original squaring formula (j3j then gives 

A = a 2 + 2ibc = a 2 + 2 i(a 2 — a)/i = 3 a 2 — 2a, 

B = ic 2 + 2ab = ic 2 + 2 (ic 2 — 6 ) = 3 ic 2 — 26, 

C = 6 2 + 2 ac = 6 2 + 2(6 2 — c) = 36 2 — 2c. 


F q e as a cubic over a quadratic extension. As before let F ? e = F 9 [z]/(z 6 —*), 
with i £ F ? a sextic non-residue. Let the tower of extensions be given explicitly 
by Fq 2 = F q [y\/{y 2 — i), and F^e = F g 2 [a:]/(a: 3 — \/i), with elements represented 
in the basis: 

a = (a 0 + aiy) + (6 0 + b±y)x + (c 0 + c\y)x 2 = a + bx + cx , 

which is superficially the same as equation ([21), but where now the isomorphism 
is given by y = z 3 , x = z. The squaring formula is identical to (j3j with i <— \/i. 

With this representation one can see that the Frobenius automorphism acts 
on x as multiplication by a sixth root of unity in F 9 , which we shall also call to. 
Noting that q = 1 (mod 6) observe that: 

x q = x^ 1 ■ x = x 3 ^ 3 ■ x = \T% q ^ ■ x = j( 9-1 )/ 6 • x. 
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Since j is a sextic non-residue in F g , we have that u> = i^ q is a primitive 
sixth root of unity in F 9 . Hence x q = ux, and similarly x q = ( uix) q = ix 2 x. 

This simplifies the cyclotomic subgroup membership equation (jT]) to: 

(a + buj 2 x + CLU 4 x 2 )(a + bx + cx 2 ) = a + btux + clu 2 x 2 , 

which upon expanding, reducing modulo x 3 — Vi, and modulo <Pq(ui ) = w 2 —w+1, 
becomes 

(a 2 — a — bcVi) — u>(Vic 2 + b — ab)x + x> 2 (b 2 — c — ac)x 2 = 0. (5) 

Solving for be, ab , ac, one obtains: 

be = ( a 2 — a)/Vi 
ab = Vic 2 + b 
ac = b 2 — c 

Substituting these into the revised squaring formula gives 

A = a 2 + 2 Vibe = a 2 + 2 Vi(ar — a)/Vi = 3 a 2 — 2 a 
B = Vic 2 + 2 ab = Vic 2 + 2 (Vic 2 + b) = zV~i(? + 2b 
C = b 2 + 2ac = b 2 + 2 (b 2 — c) = 36 2 — 2c 


3.3 Observations 

Both sets of formulae for these degree six extensions are remarkably simple, 
requiring just three F g 2 -squarings to square an element of G<p e ( g ), which is anal¬ 
ogous to the result in i i3.1l that requires two F 9 -squarings to square an element 
of G$> 2 ( q )- Combining the result from ^3.1 1 for squaring a generic element of F g 2 , 
means that for SFFs squaring in requires only six F 9 -multiplications, 

which matches the result of Stam and Lenstra. 

Strictly speaking, the formulae require knowledge of the action of the Frobe- 
nius on elements of F g 2 , which although as simple as it is, does entail Weil 
restriction of equations (JH) and (0 to F g . However, if one ignores the arithmetic 
of F g 2 and restricts directly to F g as in [TB], then the above formulae are obscured 
and indeed were missed by Granger et al. So it is in the sense that the formulae 
were discovered in this way that we mean the restriction is to F g 2 only. 

Observe also that for the extension tower, one needs to multiply by Vi & F g 2 , 
whereas for the compositum one has i £ W q . The cost of the former is however 
not much more than the latter since the basis for F g 2 /F 9 is {1 ,Vi} and so 
multiplication by Vi of a value in F ? 2 involves just a component swap and a 
multiplication by i. 

4 Comparison with Prior Work 

In this section we compare the efficiency of the squaring formulae derived in 
with the most efficient results in the literature. 
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4.1 Operation Counts 

Let m and s be the time required to perform an F 9 -multiplication and an F g - 
squaring respectively. Since the cost of computing a squaring using our formulae 
or others reduces to computing squarings in a subfield, we use the notation Sd 
and Md to denote the time required to compute the square of one or the product 
of two generic elements of ¥ q d. In our estimates we do not include the time for 
modular additions and subtractions since although not negligible, are not the 
dominant operations. This assumes that multiplication by the elements i,j used 
in 43.II and 1 13.21 can be effected with very few modular shifts and additions when 
needed, see [3] for justification of this assumption. 

We focus on TFFs with extension degrees 6,12,18 and 24 over F, ; , which are 
the main extension degrees of interest in PBC. However one can easily extrapo¬ 
late cost estimates for any field whose extension degree is of the form k = 2 a 3 b . 
To estimate the cost of a multiplication in F g fc, we use the function v(k)m where 
v{k) = 3 a 6 b . The 3 and 6 in this estimate arise from the use of Karatsuba-Ofman 
multiplication [ 251 for each quadratic and each cubic extension respectively. Our 
cost function differs from that in [24| and [15] , which is 3 a 5 b , because these as¬ 
sume that the Toom-Cook multiplication [55] of two degree three polynomials 
is more efficient than Karatsuba-Ofman multiplication, however this is not usu¬ 
ally the case (5]. Hence the cost of an F q k multiplication for the given extension 
degrees is 18m, 54?n, 108m, 162m respectively. 

The cost of squaring a generic element of F g it is more complicated, since there 
are several squaring techniques and one needs to determine which is faster for 
a given application. Using the observation in [55], one can deduce that S k = 
2M k / 2 = 2 • 3 a_1 6 b . In addition, the results due to Chung and Hasan [5] give 
three alternative formulae for squaring using the final degree three polynomial 
at a cost of 3M k / 3 + 2 S k / 3 , 2 M k / 3 + 3S k / 3 and M k / 3 + 4S k / 3 . For simplicity we 
use the second of the Chung-Hasan formulae, which incidentally for the above 
extension degrees requires exactly the same number of F g -multiplications as 
when using [34] . 

Table [T] contains counts of the number of F g -multiplications and F g -squarings 
that are required to perform a squaring in F g fc and via the methods 

arising from Weil restriction to the quadratic, cubic and F g subfields respectively. 

As is clear from the table, with the present result we have reduced the squaring 
cost for generic elements in each of these fields by a factor of two for every 


Table 1. Operation counts for squaring in various Weil restrictions of G<g fc ( q ) for 6 | k 


k 


Re %fc/F 9 fe /2 G' 4>2((J fc/2 ) 

(Stam-Lenstra [34] ) 

Res F<jfe / F ^ /3 G 4 , 6(q fc/6 ) 
(Present result) 

Resp k /f, G s>6 ( q fc/6) 
(Granger et al. [15 j) 

6 

12m 

253 = 4m + 6s 

352 = 6m 

3m + 6s 

12 

36 m 

25 6 = 24m 

35 4 = 18m 

18m + 12s 

18 

72 m 

25g = 24m + 30s 

35 6 = 36m 


24 

108m 

25i2 = 72m 

35 8 = 54m 

84m + 24s 
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degree, which greatly improves the speed of an exponentiation. If we assume for 
the moment also that m ~ s, then our squaring takes approximately 2/3-rds 
the time of the Stam-Lenstra result in the third column. In comparison with the 
final column, one sees that we beat this comprehensively; indeed for k = 24 the 
result from [T5] is worse than when using Resp k/ 2 G$^ q k/ 2 y and is barely 
better than Karatsuba-Ofman. Hence restricting to the cubic subfield is clearly 
the most efficient for fields of this form. 

Remark 1. Note that we have not included the Stam-Lenstra squaring cost for 
k = 6 because this requires q = 2 or 5 (mod 9) whereas the use of the sextic twist 
requires D = 3 and hence p = q = 1 (mod 3), thus making them less desirable for 
pairings. An open problem posed in [15] asked for a generalisation of the Stam- 
Lenstra result to cyclotomic fields of degree different from six, for pairings. We 
have shown that our formula for squaring in the cyclotomic subgroup of F * 6 when 
q = 1 (mod 6) matches the extremely efficient degree six squaring of [34] (while 
also permitting the use of sextic twists), and extends efficiently to higher degree 
extensions. Hence in the sense that we have provided an efficient tailor-made 
solution for pairings, we believe we have answered this question affirmatively. 


4.2 Applicability of Method to Higher Powerings 

As we have shown, all of the techniques to date for producing faster arithmetic 
in the cyclotomic subgroup result from an application of the Weil restriction of 
scalars of the equation defining membership of this group. A natural question to 
ask is whether this will work for extensions of any other degree? The answer is 
that it does, but that it appears very unlikely to provide a faster alternative to 
squaring. 

Let 8 {k) be the degree of the equation a <Pk( ' q ) = y once expanded and the 
linear Frobenius operation has been incorporated. If 8 = 2, then the variety 
resulting from the Weil restriction down to any intermediate subfield may help 
with squaring. If <5 > 2 then the resulting equations may help when raising 
an element of the cyclotomic subgroup to the 5-th power [31J. However this is 
unlikely to be faster than sequential squaring for an exponentiation, even when 
squaring is slow. For example, for G 4 > 3 ( ? ), one finds that 5 = 3 and the resulting 
equations aid cubing. However the ratio of the cost of a cubing to a squaring 
is > log 2 3, and thus it better to square than cube during an exponentiation in 
this case. 

Complementary to this is the fact that 5 < 2 only for extensions of degree 
k = 2 a 3 b for a > 1, b > 0. Hence pairings with embedding degrees of this form 
are ideally suited to exploit our, and the Stam-Lenstra fast squaring technique. 

5 Application to Pairing-Based Cryptography 

In this section we apply our squaring formula to extension field arithmetic re¬ 
quired in the final exponentiation of a pairing computation, and post-pairing 
exponentations, for two concrete examples. We here assume that F 9 = F p . 
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The use of our formulae is possible because for any pairing, the codomain 
is a subgroup of G$ k ( p ) C F x fc where k is the embedding degree of the curve. 
For instance, the Tate pairing on an elliptic curve has the following form: for r 
coprime to p we have 

e r : E(¥ P )[r] x E(¥ pk )/rE(¥ pk ) F p x fc /(F p \) r - 

In order to obtain a unique coset representative the output is usually powered 
by ( p k — 1 )/r. Since 

( V k - 1 )/r= ( p k - 1 )/$k(p) ■ $k(p)/r, 

the first term ( p k — 1 )/^k(p) can be computed easily using the Frobenius and 
a few multiplications and a division, while the remaining ‘hard’ part must be 
computed as a proper exponentiation. Since for any element a £ F x fc , we have 
a (.p k ~ 1 )/&k(p) e G$ k ( p ), fast arithmetic for this group can be used. 

5.1 MNT Curves 

MNT curves were discovered in 2001 by Miyaji et al. and consist of three families 
of ordinary elliptic curves with embedding degrees 3,4 and 6 [35]. For efficiency 
reasons, of most interest are the latter, for which the parametrisation of the base 
field, group cardinality and trace of Frobenius are given by: 

p(x) = x 2 + 1 
r(x) = x 2 — x + 1 
t(x) = x + 1 

Using the method of Scott et al. [33] the final exponentiation reduces to the pow¬ 
ering of an element of G p 2 _ p+1 by x. The maximum twist available has degree 
two and so for efficiency one would like to use F p 3 arithmetic with a quadratic 
extension of this field to give F p e. This implies that one should use the com¬ 
posite construction of H3.21 One can alternatively use a quadratic extension of a 
cubic extension for the Miller loop computation, and then switch to the isomor¬ 
phic tower construction of ^3.21 for the final exponentiation. This isomorphism is 
just a permutation of basis elements, and so switching between representations, 
even during the Miller loop, is viable, and therefore permits the use of the fast 
multiplication results of [9]. 

The one condition that must be satisfied in order for our method to apply 
is that p = 1 (mod 6) which requires x = 0 (mod 6), which eliminates 2/3-rds 
of potential MNT curves. While this is restrictive, the benefits of ensuring this 
condition are clear. 

5.2 BN Curves 

The Barreto-Naehrig family of pairing-friendly curves were reported in 2005 
and have embedding degree 12 [3]. The parametrisation of the base field, group 
cardinality and trace of Frobenius are given by: 
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p(x) = 36x 4 + 36.t 3 + 24s 2 + 6s + 1 
r(s) = 36s 4 + 36s 3 + 18s 2 + 6s + 1 
t(x ) = 6s 2 + 1 

Note that for odd s, F p i 2 is not pairing-friendly. However, choosing p = 3 
(mod 4) enables the use of the initial extension F p 2 = F p [s]/(s 2 + 1) which 
permits highly efficient arithmetic. Since BN curves possess a sextic twist, such 
efficient subfield arithmetic is very desirable. With this choice of extension, F ? e 
is towering-friendly with F ? = F p 2 since p 2 = 1 (mod 6) for all primes p > 3, 
and is also squaring-friendly. 

Again the efficient final exponentiation of Scott et al. |.32l can be applied 
to reduce the final powering to essentially just three exponentiations by x. In 
practise, it is recommended that x should be chosen to have as low a Hamming 
weight as possible, to minimise the resulting cost of the Miller loop jT0j. Hence 
for the final exponentiation, the entries in Table |T] imply that this cost will 
be ~ 75% the cost of the previous fastest. Indeed using our squaring method 
with the degree six extension of F p 2 given by the tower in H3.2I - i.e., a tower 
having extensions of F p of degrees 1 — 2 — 4—12 - a simple estimate of the 
cost of performing the final powering with Scott et al.' s method for a 256-bit 
prime, i.e., at AES 128-bit security, is 4856 F p -multiplications. In contrast using 
the tower extension with degrees 1 — 2 — 6 — 12 and using the Stam-Lenstra 
result of m for the final extension, this figure is 5971 F p -multiplications, so 
our method should be approximately 20% faster in practise. Furthermore by 
excluding p = 3 (mod 4), the arithmetic for PFFs would be even slower for both 
towers. 

With regard to post-pairing exponentiation, one is free to use the method 
of [13} which uses a clever application of the GLV decomposition [14] • For BN 
curves one obtains a four-dimensional decomposition and hence uses quadruple 
exponentiation to achieve this speed-up. Since there will be more multiplications 
than for the final powering the impact of our squaring formulae on the cost of 
exponentiation will be less pronounced, but still significant. 

Another factor to consider for post-pairing exponentiations is that the trace- 
based methods of LUC [33], XTR |26I35| and XTR over extension fields [27], are 
known to be faster than pH] and |16] for a single exponentiation. However we 
expect the Galbrith-Scott method to be superior since the resulting exponents in 
the quadruple exponentiation are one quarter the size, and for the trace-based 
methods efficient algorithms appear to be known only for single and double 
exponentiation [13] . ruling out their application in this context. The same rea¬ 
soning applies to schemes that require a product of pairings each with individual 
post-pairing exponentiation, such as [7]. 

The trace methods are also ruled out of Scott et al.'s final-powering method 
since many multiplications are required, whereas the trace methods only per¬ 
mit exponentiation. Hence, barring any improvements in trace-based multi¬ 
exponentiation algorithms, we expect our formulae to feature in the most efficient 
way to implement pairings, their products and exponentiation. 
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6 Application to Torus-Based Cryptography 


Our central result may also be applied to torus-based cryptography (TBC), 
which is based on the mathematics of algebraic tori, which were introduced to 
cryptography by Rubin and Silverberg in 2003 j30] . While for degree six exten¬ 
sions of prime fields, our squaring formulae only match the fastest implementa¬ 
tion of CEILIDH [TE] - which uses [3T] - for nearly all sixth degree extensions of 
non-prime fields our squaring method is the most efficient known. 

The implementation of T3o(F p ) = G<g 30 ( p ) by van Dijk et al. used the Stam- 
Lenstra result for p = 2 or 5 (mod 9) [TT]. This condition implies that q = p k = 2 
or 5 (mod 9) whenever k = 5 m . Hence the family of fields of extension degree 
6 • 5 m over F 9 for q = 2 or 5 (mod 9) matches our squaring efficiency for the 
cyclotomic subgroup. On the other hand, the condition q = 1 (mod 6) for SFFs 
is far less restrictive and in fact can be said to apply to 3/4’s of all finite fields. 

With regard to compression of torus elements, which is the central function 
of torus-based cryptography, let p = 1 (mod 6) and let the field construction for 
F ? e be the compositum given in f )3.2l We reorder the basis as so: 

a = (ao + a\X + C12X 2 ) + ( 6 0 + bix + 6 2 x 2 )?/ = a + by. 

Assuming a G G q 2 _ g+1 , then as in [17 and explicitly in |2S], a straightforward 
analysis of condition (JT|) yields that such elements - excepting the identity - can 
be represented by two elements of F q . To compress, one writes a ^ 1 as 


c — y 

a = a + by = -, 

c+y 

where c = — (a + l)/b for b ^ 0 and c = 0 if b = 0. Condition (JT]) now becomes 


\ i 2 — 9+1 

c-y y 
c + y) 


= 1 , 


and leads to the equation 3 cq+« — 3*cic 2 = 0, where c = co + Cia:-|-C 2 a: 2 . Therefore 
there is redundancy between the Ci s. One can eliminate c 2 for instance which 
can be recovered from Co and c\. The decompression map is just the inverse of 
this: 


V-:A 2 (F ? )^T 6 (F,)\{l}:(c 0 , Cl ) 


2>icqC\ + 3ic\x + (3 Cq + i) x 2 — 3ic\y 
3jcqCi + Sic 2 x + (3cg + i)x 2 + Sic\y' 


with the condition c\ ^ 0, which therefore represents all q 2 — q non-identity 
elements in G q 2 _ q+1 . 

Since this compression method works for all fields for which p = 1 (mod 6), 
achieves the maximum known compression for any algebraic torus, and has the 
fastest squaring available, we propose that such fields should be considered ideal 
candidates TBC. 

Furthermore, as stated in jT3], TBC parameters can be easily generated 
from pairing-friendly elliptic curves. The multi-exponentiation techniques stated 
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in 95.21 that one acquires from PBC when TBC parameters are generated in this 
way mean that exponentiation in T6(F 9 ) should be extremely efficient, and in¬ 
deed faster than all other known methods. 

Therefore while it could be argued that the main application of TBC is to 
PBC - in terms of offering faster arithmetic and compression mechanisms for 
systems that may be used in practise - here TBC really benefits from PBC, thus 
demonstrating a neat symbiosis between the two application areas. 

7 Conclusion 

We have presented a method to perform squaring extremely efficiently in the 
cyclotomic subgroup of F* e , for q = 1 (mod 6). We have shown how to apply 
this result to fields of interest in pairing-based cryptography to obtain the fastest 
final- and post-pairing exponentiation algorithms, and also detailed why these 
fields are ideally suited for torus-based cryptography, when p = 1 (mod 6). 

Since these fields include those listed in the IEEE’s P1363.3/D1 draft standard 
for identity-based public-key cryptography, which use pairings over ordinary el¬ 
liptic curves that permit the fastest pairing via a maximal twist, our result 
strongly supports their standardisation, but also demonstrates that the more 
general squaring-friendly fields introduced here warrant serious consideration 
for inclusion. 

We leave it as an open problem to find similarly efficient squaring formulae 
for the remaining case q = — 1 mod 6. 
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Abstract. Research on efficient pairing implementation has focussed on 
reducing the loop length and on using high-degree twists. Existence of 
twists of degree larger than 2 is a very restrictive criterion but luckily 
constructions for pairing-friendly elliptic curves with such twists exist. 
In fact, Freeman, Scott and Teske showed in their overview paper that 
often the best known methods of constructing pairing-friendly elliptic 
curves over fields of large prime characteristic produce curves that admit 
twists of degree 3, 4 or 6. 

A few papers have presented explicit formulas for the doubling and 
the addition step in Miller’s algorithm, but the optimizations were all 
done for the Tate pairing with degree-2 twists, so the main usage of the 
high-degree twists remained incompatible with more efficient formulas. 

In this paper we present efficient formulas for curves with twists of 
degree 2,3, 4 or 6. These formulas are significantly faster than their pre¬ 
decessors. We show how these faster formulas can be applied to Tate 
and ate pairing variants, thereby speeding up all practical suggestions 
for efficient pairing implementations over fields of large characteristic. 

Keywords: Pairings, Miller functions, explicit formulas, Tate pairing, 
ate pairing, twists, Weierstrass curves. 


1 Introduction 

Many new protocols are based on pairings and so the construction of pairing- 
friendly curves and the efficiency of pairing computation has become a field of 
active research. The first wave of this research exhausted many tricks that can be 
applied inside a Miller iteration, resulting in significant computational speed ups 
I4l6|7134j ■ The second wave of improvements focussed on constructing pairing- 
friendly elliptic curves |5I1 113711 6181221911 7128) . and this research is extended 
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and collected in [[TS]. The third and more recent wave of research has focussed 
on reducing the loop length of Miller’s algorithm |:-i. r ,i2fil:-il.T2| to be as short as 
possible j42l2R] . Along the way, there have been several other clever optimizations 
that give faster pairings in certain scenarios, including compressed pairings |36] , 
single coordinate pairings p?T] , efficient methods of hashing to pairing-friendly 
groups [ 55] , and techniques that achieve a faster final exponentiation [sasg. 

After the introduction of projective coordinates for pairing computations in 
m, very little was heard about low level optimizations. This started to become 
more interesting lately for alternative curve shapes such as Edwards curves, 
studied in |14I27I2| . and curves of the form y 2 = x 3 + c 2 , studied in [T3]. 

All of these improvements are presented in the context of the Tate pairing 
on curves with even embedding degrees and using only quadratic twists, since 
the nature of the Tate pairing allows for a relatively simple exposition and im¬ 
proves efficiency through denominator elimination. At the same time, curves 
with larger degree twists give much more efficient pairings and choosing spe¬ 
cial curve shapes was risking this larger benefit. On top of that, Galbraith [2U] 
studied the group orders of curves and their twists and showed that for Edwards 
curves only quadratic twists could be used, in the sense that the only twist which 
preserves the existence of a point of order 4 is a quadratic twist. This deterred 
further research on ate pairings and other variants for special curves. In this 
paper we show that it is possible to compute a small power of the ate pairing 
entirely on the twisted curve; so the curve can be chosen so that the twist of 
the curve admits a particular shape. We show the fields of definition for the re¬ 
spective coordinates. This provides a framework for converting Tate-like pairing 
computation formulas and operation counts to their ate-like analogues. 

For BN curves [5] , Akane, Nogami, and Morikawa showed in [T] that the ate 
pairing itself can be computed on the twisted curve. Our result covers more 
general curves but computes the ate pairing only up to a power. Furthermore, 
the idea of using twists in order to cover curves of special shapes is new. In the 
context of Weierstrass curves, our result gives an easy way of computing the cost 
of evaluating the Miller function. 

For all practically useful embedding degrees, the best methods of constructing 
pairing-friendly curves mostly produce elliptic curves of the form y 2 = x 3 + 
ax + b with a = 0 or b = 0 (see [H]). In this paper we consider these two cases 
separately to give specialized pairing formulas in both scenarios. In particular, we 
achieve the fastest known formulas for computing pairings on general curves with 
b = 0 in weight-(l,2) coordinates. In addition, the point doubling formulas we 
derive for curves of this form are currently the fastest published point doubling 
formulas [TO] across all forms of elliptic curves. For pairings on general curves 
with a = 0, we use standard projective coordinates. The doubling step on these 
curves is two field multiplications faster than the previous record for such curves. 
Furthermore, we also consider the case of computing pairings on curves with 
odd embedding degrees that employ cubic twists, where we present formulas 
which are significantly faster than their predecessors. Lastly, we also suggest an 
improvement to the formulas presented in [T3] . Note that for ate pairings, speed 
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ups in the doubling and addition step save computations in fields whose sizes 
grow proportionately to the embedding degree. This means that applying these 
faster formulas to the ate pairing variants will give relative speed ups which are 
consistent across all embedding degrees and savings which do not suffer as the 
extension field arithmetic becomes more complex. 

The rest of this paper is organized as follows. Section [2] provides a brief back¬ 
ground on pairings. In Section [3l we present a modified method of computing 
the ate pairing where all operations involve points only on the twisted curve. 
This theoretical result is a key ingredient for efficient computation of the ate 
pairing and has applications outside the scope of this paper, e.g. for Edwards 
curves. We then show how Tate pairing formulas and operation counts can be 
easily modified to this method of computing the ate pairing. In Sections 0] and 
0 we present faster formulas for pairing computations that employ quadratic, 
quartic or sextic twists. In Section [6l we present faster formulas for pairings on 
curves with odd embedding degrees divisible by 3. We compare our results with 
the state-of-the-art pairing formulas in Section 0 

2 Background on Pairings 

Let p > 3 be a prime, and let E be an elliptic curve over F g , char(F g ) = p, 
with short Weierstrass equation E : y 2 = x 3 + ax + b and point at infinity O. 
Let r / p be a prime divisor of n = #E(F q ) = q + 1 — t and let k > 1 be the 
embedding degree of E with respect to r, i. e. k is minimal with r \ q k — 1. For 
the r-torsion subgroup, we have E[r] C E(F qk ). Let p r C F* fe be the group of 
r-th roots of unity. For to £ Z and P £ E[r), let f m ,p be a function with divisor 
div(/ mj p) = m(P) — ([m]P) — (to — 1 )(0). The reduced Tate pairing is defined 
as 

T r ■■ E(F qk )[r] x E(F qk )/[r}E(F qk ) -+ p r , (P, Q) ^ / r ,p(Q)^. 

In practice one restricts the arguments to groups of prime order r. If r 2 j n, the 
most common choice is to take the groups 

G 1 = E[r\ n ker(0 g - [1]) = P(F ? )[r], G 2 = E[r] n ker(<(» g - [<?]) C E{ F,*), 

where <p q is the g-power Frobenius endomorphism on E. The groups G\ and G 2 
are the eigenspaces of 4> q on E[r\ and we have E[r\ = Gi ® G 2 . From now on, 
we consider e r , the reduced Tate pairing restricted to G\ x G 2 , i. e. 

e r : Gi x G 2 —> p r , ( P,Q ) > fr,p{Q) r ■ 

Let T = t — 1. Restricting the Tate pairing to G 2 x G\ leads to the ate pairing 

m 

Qt ■ G 2 x Gi > p r , (Q,P) /t,q(P) r • 

Note that the parameter r is changed to T. The group G 2 consists of points 
defined over F g fc. Often G 2 can be represented by a subgroup G ' 2 of a curve 
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isomorphic to E over Let d \ k; an elliptic curve E' over F qk /d is called a twist 
of degree d of E /F qk /d if there is an isomorphism ip : E' —* E defined over F^t, 
and this is the smallest extension of F qk /d over which if) is defined. Depending 
on the j-invariant j{E) of E , there exist twists of degree at most 6. Pairing- 
friendly curves with twists of degree higher than 2 arise from constructions with 
^-invariants j(E) = 0 and j(E) = 1728. 

A twist of E is given by E': y 2 = x 3 + au 4 x + but 6 for some ui £ F qk . The 
isomorphism between E' and E is 'E : E’ —> E : ( x',i /) —> (x'/ui 2 , y'/u; 3 ) with 
inverse if - ” 1 : E —» E' : (x, y) —> (lo 2 x, ix 3 y). Depending on j(E) and w, we obtain 
the possible degrees of a twist E' as summarized in Table [Tj The isomorphism 
E induces a group isomorphism G 2 —> G 2 , where G ' 2 = E'(F qk /d)[r\. Thus, 
points in G '2 can be represented by their image under E^ 1 . In what follows, 
we write P' for the point on the twist E' corresponding to a point P £ E, i. e. 
P’ = E~ l {P) and P = E{P'). The last two columns in Table[[]show the subfields 
of F ? fe in which the coordinates of the specific points are contained. For example 
(Fgk/ 2 , Fq*:) means that the ^’-coordinate is in V qk /2 and the y-coordinate is in 
F ? fc. The last column illustrates that the coordinates of P' lie in the same fields 
as the coordinates of Q. The importance of this becomes evident in Section |3] 
Since the points in G ' 2 are defined over a smaller field than those in G 2 , curve 
arithmetic is more efficient in G' 2 . 

Table 1. The nature of the twist isomorphisms for twists of degree d 


d 

m 

a, b 

fields of definition 
for powers of ui 

Q' = (xq',Vq') 
P = ( xp,yp ) 

Q = W) 

P' = ! P~\P) 

2 

<£ {0,1728} 
a 7 ^ 0 , b 7 ^ 0 

uF,u! 4 ,u> b G W qk/2 

ui 3 £ \F a fc /2 

(F 9 fe/ 2 ,F 9 fc/ 2 ) 

(F«,F,) 

i¥ qk /2,¥ qk ) 

(F 0 fc/ 2 ,F 9 fc) 

3 

0 

a = 0 , b ^ 0 

G ¥ qk /3 

J 1 G F Q fc \ F ofe/3 

(F 9 fc/3 , F 9 fc/3 ) 

(F«,F,) 

(F 9 fe,F 9 fc /3 ) 

(F 9 fc,F a fc /3 ) 

4 

1728 

a ^ 0 , b = 0 

w 4 G F 9 fc/ 4 , u/ G W qk/2 
lo 3 £ F ? fc \F Q fe / 2 

(¥ qk/i ,¥ qk/i ) 

(F„F,) 

(¥ qk /2 , ¥ qk ) 
(F 0 fe/ 2 ,F 9 fc) 

6 

0 

a = 0 , b £ 0 

G ¥ qk/6 , a ;' 5 G F^/3 
oj 2 G ¥ pk/2 

{¥ qk /6,¥ qk /e) 

(F„F,) 

(F 9 fc / 2 ,F 9 fe/ 3 ) 

(F o fc/ 2 ,F o fc/ 3 ) 


Assume that E has a twist of degree d and that d \ k. Let e = k/d, T e = T e 
mod r. The twisted ate pairing is defined as 

VT e : Gi x G 2 —> Hr, ( P,Q ) | —> fT.,p(Q)~- 

The reduced Tate and the twisted ate pairing are both defined on Gi x G 2 , 
while the ate pairing is defined on G 2 x Gi. We aim to simultaneously treat 
both concepts of pairings by respectively fixing R and S as the first and second 
arguments of either pairing. For both variants, we thus write f m AS) iq ” 1)/r , 
where m 1 R , S are chosen according to the desired pairing. Miller’s algorithm is 
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used to compute the pairing as follows: Let m = ..., mi, 7710)2 be the 

binary representation of to, initialize U = R, f = 1 and compute 

1. for i = l — 2 to 0 do 

(a) / - f 2 ■ /dbl (C /)(S), U <— [2]U, 

(b) if TOj = 1 then f <- f ■ fADD(u,R)(S), 

U<-U + R. 

2. / 4- /(^ fc -DA. 

The function /dbl(C 7 ) is defined as /dbl(c/) = ^dbl(jj)/ w dbl(! 7 )> where /dbl(J 7 ) is 
the function of the line tangent to E at the point U and wdbl(( 7 ) is the function 
of the vertical line through [2 \U. Analogously, the function /add(e/,k) is defined 
as f add(u,r) = Iadd(u,r)/ v add(u,r)i where l add (u,R) is the function of the line 
through the points U and R and vadd(u,r) is the function of the vertical line 
through U +R. If one of the inputs to the addition is given in affine representation 
we speak of a “mixed addition” and use the abbreviation mADD. 

StepQ]in the above algorithm is called the Miller loop; it computes the function 
value fm,R(S) up to r-th powers. Step [51 the final exponentiation, determines 
the final pairing value. 

The number of iterations of the Miller loop is equal to l — 1, where l is the 
bitlength of to. Therefore, reducing the bitlength of m reduces the number of 
iterations in the Miller loop which reduces the cost of the pairing computation. 
Several papers have proposed methods for loop shortening [30132143142125] . For 
example, for the twisted ate pairing one can replace T e by any of its powers 
modulo r and choose the smallest of those. A good choice for the ate pairing 
is to use the R-ate pairing [30], which often achieves an optimal loop length of 
log (r)/tp(k), yielding an optimal pairing [42] . 

3 Computing the Ate Pairing Entirely on the Twisted 
Curve 

Several authors have presented new formulas that achieve faster iterations of the 
Miller loop on certain curves [12I14I27I2I13I . The operation counts presented in 
these papers are given in the context of Tate pairing computations on curves 
with even embedding degrees, where all elliptic curve operations occur in the 
base field F g and the functions in the Miller loop are evaluated at a point which 
has one coordinate in ¥ q k /2 and one in the full extension field F g *. This allows 
for a relatively simple exposition. However, the ate pairing reverses the roles 
of the points involved and employs twisted curves. This means that some of 
the optimizations can not be applied in the same fashion. The purpose of this 
section is to tidy up this discussion and to show how operation counts for the 
Tate pairing can be easily modified to give the analogous ate pairing count. 

The usual practice when computing the ate pairing clt{Q , P) of the points 
P G E(¥ g ) and Q G E(¥ q k) is to map the point Q to the twisted curve using the 
isomorphism <? _1 , so that the point operations (doubling/addition) in the Miller 
loop can be performed more efficiently using the point Q' = \P~ 1 (Q) G E'(¥ k/d), 


//doubling step (DBL) 
//addition step (ADD) 
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whose coordinates are defined over the smaller field F k/d. When it is time to 
compute the Miller line, Q' is “untwisted” back to the full extension field via 
Q = <F( Q '). Operation counts for the Tate pairing do not carry over directly 
to the ate pairing. In particular, for the Tate pairing it is the y-coordinate of 
the second argument that is in the full extension field F^fc, whereas one of the 
coordinates of the first argument in the ate pairing is in F^. This means that 
all optimizations that were based on eliminating subfield elements have to be 
revised. 

Furthermore, pairings on special curves such as Edwards curves and the curves 
in [T5j pose conditions on cofactors of the group order. Galbraith [20] pointed 
out to us that for twists of degree larger than 2, E and E' can not both simulta¬ 
neously be in Edwards form. His arguments also apply to the curves in [ fO] with 
sextic twists. So far this meant that the formulas used for the point operations 
and the formulas derived for the Miller functions must be treated separately 
which usually results in a greater overall operation count. 

We show that a small (< 6) power of the ate pairing can be computed entirely 
on the twisted curve, rendering the above concerns obsolete. Our pairing can 
make use of loop shortening techniques just like the ate pairing, but only requires 
one curve (the twisted curve) to have particular properties. Furthermore, Tabled] 
shows that most coordinates of the twisted points P' and Q' are defined over 
subfields. Note that the computation of a small power of pairings for efficiency 
reasons has been addressed in previous work, see for example [15] . 

Theorem 1. Let EfF q : y 2 = x 3 +ax+b and let E' /F qk /d : y 2 = x 3 +auj 4 x+buj 6 , 
a degree-d twist of E. Let T be the associated twist isomorphism T : E' — > E : 
(. x',y') —> (x '/ uj 2 , y'/ uj 3 ) . Let P £ Gi, Q £ G 2 , and let Q' = T~ 1 (Q) and 
P' = L r ~ 1 (P). Let ot(Q,P) be the ate pairing of Q and P. Then 

a T (Q, P) gcd(d ’ 6) = a T {Q',P') scd{d ’ 6) , 

where a T (Q',P') = fT,Q'(P l ) ( ' qk ~ 1 ^ r uses the same loop parameter as ar(Q,P) 
on E, but takes the two twisted points Q' and P' as inputs, instead of Q and P. 

Proof. Since all factors of the Miller values that lie in a proper subfield of F^k 
vanish under the final exponentiation, it suffices to show that the Miller function 
updates at each iteration are equal, up to a constant defined over any proper 
subfield of F ?fc . The computation of ot(Q,P) is composed of addition and dou¬ 
bling steps. Consider the gradients of the lines at either the addition or doubing 
stage of the Miller loop respectively. We have 

y' 2 - y[ uj 3 (y 2 - yi) 2/2 — 2/i . 3xf + aw 4 w 4 (3a;f + a) 3xf + a 

—7 -- = —5--- = uj -and -—-= —-—r-= uj — - 

x 2 — uj a [x 2 — xi) X2 — x\ 2 y\ ^ 6 yi 2 yi 

for addition and doubling. We write the update to the Miller function at the 
doubling step, / D bl(c/')(-P')< as 

(^DBL(CJ / ) (-f >/ ))/(^ J DBL([ 7 / ) (-^ 0 ) = (VU’ - VP' - A \XU' - Xp>))/(Xp> - X[ 2 ]U') 

= (u 3 yu ~ u 3 y P - ujX{oj 2 xu - u> 2 x p ))/(lo 2 x p - u} 2 x [2 ]u) = v • /dbl(c/)(-P), 
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where A and A' are the gradients determined before. We also have ,/ADD([/',Q')(-P , ) = 
u) • /add(j/,q) (P) ■ For twists of degree d— 2 and d = 4, observe that w 2 = w s cd ( d > 6 ) 
is in a subfield of F g it and thus vanishes in the final exponentiation. Similarly, 
for d = 3 and d = 6, w 3 and ui 6 are both in subfields of F g fc so that introducing a 
factor of 3 and 6 respectively to the exponent of ar(Q', P') will give an identical 
result to the computation of the same power of clt(Q,P)- □ 

Corollary 2 . If ar(Q, P) is bilinear and non-degenerate, then so is ot{Q',P')- 

Remark 3. Note that for d = 6 both u 2 and w 3 are in proper subfields of 
F 9 k. Thus their contributions to the denominator and numerator vanish in 
the final exponentiation, so there is no need to introduce a factor of 6 to the 
final exponent. That is, for sextic twists it is actually always the case that 
ar(Q,P) = clt{Q',P')- If denominator elimination is used for d = 6, the values 
differ by w 3 which lies in a subfield. For k = 12 and BN curves this case was con¬ 
sidered by Akane, Nogami, and Morikawa [T] who showed that up to constants 
from subfields ot(Q,P) = Ot{Q',P')- 

For the other cases either w 2 or u > 3 lie in a proper subfield F g e of F g k. If 4 
or 9 divides TId|fe ^d{o)/{Q e — 1), respectively, we obtain u/ 9 * -1 )/’" = 1 and thus 
automatically ot(Q,P) = clt(Q', P')- However, in general these conditions are 
not satisfied, and the extra power of 2 or 3 is needed to obtain the same result. 

Computing the ate pairing as ot{Q',P') and using twists as in Table |T] implies 
(for d < 6) that the only coordinate that lies in the full extension field F g k 
belongs to the second argument; for d = 6 all coordinates are defined over 
subfields. In this sense, the field operations encountered in computing the ate 
pairing ariQ 1 , P') on E' mimic the field operations encountered in computing 
the Tate pairing e r (P,Q) on E. Thus, point operation and line computation 
formulas that work in the Tate pairing can directly be applied to the ate pairing. 

Inversions in F g t are prohibitively expensive and so we will show for all curve 
types a way to eliminate denominators. Therefore, at the doubling or addition 
stage of a Miller iteration the update function is given by a polynomial / = 
y~b ■ Li j ■ a ’sUg, where the Lij are functions solely of the intermediate point U 
(doubling) or of the intermediate point U and the base point R (addition). In the 
Tate pairing computation of e r (P, Q), the Lij are functions of some multiple of 
the point P £ E( F g ) and therefore all calculations required to compute the L i 3 
are performed in the base field F g . Similarly, in the modified definition of the 
ate pairing computation of ot{Q',P'), the j are functions of some multiple 
of the point Q' £ E'(¥ q e) and therefore all calculations required to compute the 
Lij in this case are performed in the subfield F g e. Thus, if the computations of 
the Lij in an iteration of the Tate pairing require mmi + ssi, where mi and Si 
denote multiplication and squaring in F g , then the equivalent computations in 
an iteration of the ate pairing will require mm e + ss e , where m e and s e denote 
multiplication and squaring in F g e; a multiplication by the curve constant a 
costs d a . 

For even embedding degrees (admitting quadratic, sextic or quartic twists) 
the function update always simplifies to / = Li^x + To,i2/ + -^o.Oj so that we 
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have two extra multiplications required here (£ 1,0 by x and L^ i by y). In the 
Tate pairing as well as in the ate pairing each of these multiplications costs 
e = k/d base field multiplications if field extensions are represented in a suitable 
way. If k is odd and divisible by three and if the curve admits a cubic twist, 
the function update requires more terms. For comparison, let there be /iadd 
non-zero terms (excluding To.o) i n the addition step and /idbl in the doubling 
step, each of which costs e = k /3 base field multiplications. We summarize the 
situation for different twists in Table |2l 

Table 2. Converting operation counts for single addition and doubling steps in the 
Tate pairing e r (P,Q) and ate pairing ar{Q ', P') 


k even 

DBL 

ADD/ mADD 

Tate: e r (P,Q) 

mimi + sisi + 2emi + m fe + s*. 

m. 2 mi + S 2 S 1 + 2emi + m* 

Ate: ar{Q', P') 

mim e + sis e + 2emi + mi + s*. 

mjm E + S 2 S e + 2emi + m fc 

k odd, 3 | k 

DBL 

ADD/ mADD 

Tate: e r (P,Q) 

mimi + sisi + /iDBLemi + m fc + s* 

m 2 mi + s 2 si + ZiADDemi + m fc 

Ate: ar(Q', P') 

?mm e + sis e + /iDBLemi + m fc + s k 

m 2 m e + S 2 S e + ZiADDemi + m k 


In what follows, whenever we omit the subscripts from the operation costs 
and write m and s, we mean mi, Si for Tate pairing computation and m e , s e 
for ate pairing computation. 

Remark 4- Note that by Theorem [T] the computation of axiQ ', P') scd ( d ’ 6 ) can 
be done entirely on the twisted curve. This means that Edwards curves can be 
employed in the ate setting if we choose the original curve such that the twisted 
curve can be written in Edwards form. 

All curves we consider in the following are defined over the prime field F p . We 
therefore restrict to the case q = p from now on. 

4 Pairings on y 2 — x 3 + ax with Even Embedding 
Degrees 

The only curves which admit quartic twists over F p are of the form E : y 2 = 
x 3 + ax. In this section we assume that the embedding degree k is even and so 
by Table Q] we can use that the x-coordinates of Q (used in the Tate pairing) and 
of P' (used in our modified ate pairing) are defined over a subfield of F p ic. Using 
the naming convention introduced in Section [2j xs is defined over a subfield of 
F p k while ys is minimally defined over F p ic. 

Curves of the form E : y 2 = x 3 + ax have not received much attention, even 
for simple elliptic curve arithmetic, e.g. no special formulas were reported in the 
EFD [TO] before our paper. We present new formulas for addition and doubling 
in a new coordinate system, which we call “weight-(l, 2) coordinates”. The point 
(X : Y : Z) corresponds to the affine point (x,y), where x = X/Z and y = YjZ 2 . 
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The projective curve equation for these weights is Y 2 = X 3 Z + aXZ 3 . Lopez 
and Dahab studied such coordinates in the context of elliptic curves over binary 
fields but these weights have not been used in the context of curves over odd- 
characteristic fields. 

It is quite remarkable that our doubling formulas are faster than any doubling 
formulas reported for elliptic curves in the EFD. 

We extend the explicit formulas for curve operations to compute the doubling 
and the addition step on these curves. The resulting pairing computations are 
also significantly faster than their predecessors. 

Doubling formulas. For this curve shape the affine doubling formulas to com¬ 
pute (£ 3 , 2 / 3 ) = [2 }U = [2](£ 1 , 2 / 1 ) simplify to £3 = A 2 -2£i, 2/3 = A(£i-£ 3 ) -yi, 
where A = (3 £ 2 + a)/{2y{). In weight-(l,2) coordinates the doubling formulas 
to compute (X 3 : I 3 : Z 3 ) = [2](Xi : Y\ : Z\) become 

X 3 = (X 2 - aZ 2 ) 2 , Y 3 = 21 1 (X 2 - aZf)((X 2 + aZ 2 ) 2 + 4aZ 2 X 2 ), Z 3 = 4 Y 2 . 

The point doubling needs lm + 6 s + ld a using the following sequence of 
operations. 

A = X 2 , B = if, C = Zl, D = aC, X 3 = (A - D) 2 , (1) 

E = 2(A + D) 2 - X 3 , F = ((A - D + Id) 2 — B — X 3 ), Y 3 = E ■ F, Z 3 = 4 B. 

These formulas are now the fastest doubling formulas reported in the EFD [I|J]. 
They are faster by 1 s-m tradeoff, than the previous champion, “dbl-20090311- 
hwcd” due to Hisil, Wong, Carter, and Dawson. Those formulas are optimized for 
“Doubling-oriented XXYZZR coordinates for Jacobi quartics” and need 2m + 
5s + ld a , where a is some curve constant. 

Line computation for doubling. In the doubling step of the pairing computa¬ 
tion we need to compute [2] U and to compute the line function at U and evaluate 
it at S' = (£ 5 , 2 /s)- The affine formula for the computation of /dbl(c/)( 5) is given 

„„ A(Xi /Z 1 -xs)+ys-Y 1 /Z* _ ■2Y 1 (-(3X^Z 1 +aZ^)-x s +(2Y 1 Z 1 )y s +Xf-aZ^X 1 ) c ._ 

x s -(\ 2 -2X 1 /Z 1 ) — -( 4Y 1 2 Z 1 )-x s +9XfZ 1 +6aX 2 Zf+a 2 Z^-8X 1 Y 2 ' 01110(3 

any element except for j/s is in a proper subfield of , we can omit computing 
the entire denominator and also the multiplication by — Yi. We leave the factor 
of 2 to obtain an s-m tradeoff. The simplified line function is 

fuB H u)(S) = -2(3 X?Zi + aZ 3 ) ■ x s + (4 Y 1 Z 1 ) ■ y s + 2(X 3 - aZ 2 X x ). 

We write as = L 1>0 ■ x s + L 0>1 ■ y s + L 0 , 0 and compute 

Li t), Lq i and Lo,o as 

L lfi = -2Z V (3-A+D), Lq^ = 2{(Yi+Zi) 2 —B—C), L 0fi = {X 1 +A-D) 2 -X 3 -A, 

using the values computed in ([T)) at an additional cost of lm + 2 s, so that the 
total operation count for point doubling with line computation is 2 (fc/d)mi + 
2 m + 8 s + ld Q . 
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Addition and mixed addition. In affine coordinates, the sum (£ 3 , 2 / 3 ) = 
U + R = (£ 1 , 2 / 1 ) + (£ 2 , 2 / 2 ) is given by £3 = A 2 - xi - x 2 , 2/3 = A(xi - £ 3 ) - 2/i, 
where A = ( 2/1 — 2 / 2)/(£1 — £ 2 )- In weight-(1,2) coordinates this becomes (A 3 : 
V 3 : Z 3 ) = (Ai : Y 1 : Z x ) + (A 2 : A 2 : Z 2 ) 

X 3 = (Y\Z 2 - Y 2 Z 2 ) 2 - (X\Z 2 + X 2 Z x )T , 

y 3 = ((ViZ 2 - Y 2 Z 2 )(X x Z 2 T - X 3 ) - Y X Z 2 TU)UZ X Z 2 , 

Z 3 = (UZ,Z 2 ) 2 , 

where T = (Ai Z 2 — X 2 Z\) 2 Z\Z 2 and U = (X±Z 2 — X 2 Z 3 ). This addition can 
be computed in 10m + 7s using 

A = Z\, B=Z\, C= {Z 1 + Z 2 ) 2 -A-B, D = X 1 -Z 2 , E = X 2 -Z u 
F = Yi ■ B. G = Y 2 ■ A, H = (D — E), 1 = 2 (F - G), 11= I 2 , J = C ■ H, 

K = 4J • H, A 3 = 2 II - (D + E ) • K, Z 3 = J 2 , 

Y 3 = ((J + I) 2 -Z 3 - II) • (D K - A 3 ) - F ■ K 2 , Z 3 = 2Z 3 . 

For mixed addition, i.e. Z 2 = 1, the number of operations reduces to 8 m + 5s 

omitting computation of B, G, D and F. 

Line computation for addition and mixed addition. For affine points U, R, 
and S the line function is given by f A DD(u,R)(S) = • A S ain > we 

can omit the denominator because it is entirely defined over a subfield of F p fc. 
In weight-(l,2) coordinates the modified line function becomes /add(c/ r)(&) = 
/ • A 2 Z 2 — I ■ xsZ 2 + J ■ ysZ 2 — J • 1 2 . The values A 2 Z 2 ,xsZ 2 , and ysZ 2 
do not change during the computation and can thus be precomputed. For the 
Tate pairing the cost of one addition step (computation of addition and line 
function) therefore is (2fc/d)nii + 12m + 7s. If d = 2 it is possible to save 
lm by computing I ■ (A 2 Z 2 — xsZ 2 ). When computing the ate pairing, the 
multiplications in I ■ A 2 Z 2 —I-xsZ 2 + J-ysZ 2 — J-Y 2 cost lm each, given the 
shape of xs and ys- The cost of one addition step (computation of addition and 
line function) in the ate pairing therefore is 14m + 7s. 

For mixed additions (Z 2 = 1) this simplifies to /^add((; ii) (S) =I-X 2 —I- 
xs + J ■ ys — J • ^ 2 , costing (2fc/d)mi + 10m + 5s for both the ate and the Tate 
pairing for a complete mixed addition step. For d — 2 again lm can be saved in 
the Tate pairing. 

If R is reused several times in the Tate pairing it might be worthwhile to 
precompute I/Y 2 for longterm usage. At the beginning of a pairing computation 
A 2 = X 2 /Y 2 ,xs = xs/Y 2 and ys = Vsj^i are computed. Since Y 2 lies in F p , so 
/mADD(c/,i?)(' S ') can be replaced by 

/mADD(C/ I i?)('^)/^" 2 = I ' — I ‘ %S + J ' 2/S — J 

without changing the pairing value. Note also that TablejTjshows that £5 and ys 
are defined over the same fields as £5 and ys are. In this case a mixed addition 
step costs only (2fc/d)nii + 9m + 5s. 
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If instead S is reused several times in the ate pairing, similar savings are 
possible. It is useful to precompute 1 /y' s and update the function by 

JmADD(U,R)(S)/Vs = I ■ X 2 — I ■ Xs + JU 3 ~ J • Y 2 , 

where X 2 = X 2 /y s ,x s = x s /ys and % = Y 2 /y, s , and y s = ys w 3 with y s £ F p . 
In this case a mixed addition step costs only (2fc/<i)mi + 9m + 5s. 

Note that these savings are compatible with the saving for d = 2. 

Depending on the representation of over F p k / 2 and F p k/d it is possible to 
save operations in the other cases. 

5 Pairings on y 2 = x 3 + b with Even Embedding Degrees 

The only curves which can have sextic twists over F p are of the form E : y 2 = 
x 3 + b. In this section we assume that the embedding degree k is even and 
so by Table H] we can use that the x-coordinate of Q (in e r (P,Q)) and of P 1 
(in ar(Q', P')) is defined over a subfield of F p fc. Using the naming convention 
introduced in Section [2j xs is defined over a subfield of F p f, while ys might be 
defined over F p fc. Note that if d = 6 , ys is also defined over a proper subfield, 
namely F p fc/ 3 . For these curves we obtained the best results in standard projective 
coordinates where the curve equation y 2 = x 3 + b becomes Y 2 Z = X 3 + bZ 3 . 

When b is a square in F p , the curve E always has a point of order 3, otherwise 
such a point never exists in E(¥ p ). The former case was extensively studied in 
[HI in the context of the Tate pairing. The addition formulas are independent 
of the nature of the curve constant b and can therefore also be used for non¬ 
square b. We slightly improve these addition formulas in the second half of this 
section and use these formulas for all curves with a = 0. The first part of this 
section focuses on achieving faster operation counts at the Miller doubling stage 
on general curves of the form E : y 2 = x 3 + b, where we make no assumptions 
about the nature of the curve constant b (and consequently the order of E). 

Point doubling and line computation. The affine doubling formulas differ 
from those in Section|J]in the definition of A. We have A = 3x\/2yi. In projective 
coordinates and after eliminating powers of X 3 via the curve equation, we obtain 
(X 3 : u 3 : Z 3 ) = [2](Xi : ^ : Z x ) as 

X 3 = 2X 1 Y 1 (Y 2 - 9 bZ 2 ) 7 Y 3 = Y , 4 + 18bY 2 Z 2 - 27 b 2 Z 4 , Z 3 = 8Y 3 Zj. 

We homogenize the affine doubling line using x\ = X\/Z\ and yi = Y\/Z\ and 
get 

&B1W 5 ) = 3A 'i •• ys + - y i 2 - 

We write /d B l(e/)(‘ S ') = -^ 1,0 • x s + £ 0,1 ' ys + ^ 0,0 and compute L lt0 , Top, £ 0,0 
and the point (X 3 : Y 3 : Z 3 ) using the following sequence of operations. 

A = X 2 , B = Y 2 , C = Z 2 , D = 3 bC, E = (X 3 + Y ) 2 - A- B , 

F=(Y 1 + Z 1) 2 - B-C , G = 3D, X 3 = E ■ (B - G), 

Y 3 = (B + G) 2 - 12D 2 , Z 3 =4:B- F, Li, 0 = 3A, L 0 , 1 = —F, L 0 , 0 = D~B. 
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The total count for the above sequence of operations is 2m + 7s + ldb in addition 
to the multiplications by xs and ys- Note that doubling outside the context of 
pairings would omit the computation of A and would obtain E = 2X{Y\, needing 
a total of 3m + 5s + ld&. As doubling formulas they are not competitive with 
those in the EFD but they are almost the fastest for the doubling step in pairings, 
second only to y 2 = x 3 + c 2 in jTS]. 

Addition, mixed addition and line computation. For the addition of points 
on y 2 = x 3 + b , we adopt the formulas obtained in [133 for curves of the form 
y 2 = x 3 + c 2 . These addition and line computation formulas are independent of 
b being a square. The cost for an addition is 12m + 2s The addition line in |T3j 
can be written as /addcc.h)^) = (XiZ 2 - Y 2 Zi) ■ X 2 - (YiZ 2 - Y 2 Z 1 ) ■ xsZ 2 + 
(XiZ 2 — X 2 Zi) ■ ysZ 2 — (XiZ 2 — X 2 Z\) ■ Y 2 . Note that the coefficients appear as 
subexpressions in the mixed addition of U and R, so computing /add((/r)(^) 
as above costs an extra (2k/d)m.i + 2m for the Tate pairing and an extra 4m 
the ate pairing. 

If R = ( X 2 : Y 2 : 1), the addition U + R becomes a mixed addition and costs 
9m + 2s. Computing the addition and the line as R)(^) = (Ti — Y 2 Z\) ■ 

X 2 - ( Y\ - Y 2 Zi) ■ xs + {Xi - X 2 Zi) ■ ys - (Xi - X 2 Zi) ■ Y 2 costs an extra 
(2fc/d)nii + 2m for both the Tate and the ate pairing. 

If R or S is fixed in the mixed addition, similar comments to Section 0] apply, 
reducing the extra costs to only ( 2 k/d)mi + m. 

6 Fast Formulas for Pairing Computations with Cubic 
Twists 

For an odd embedding degree k, the only possible non-trivial twists are cubic 
twists and these only exist for curves of the form y 2 = x 3 + b , requiring also that 
3|fc. Table [I] shows that in this scenario the point S = ( xs,ys ) has xs defined 
over the full extension field F p fc and ys defined over a subfield. The formulas 
obtained in most publications including the previous sections use denominator 
elimination based on xs being in a subfield. 

In this section we present fast formulas for addition and doubling steps for 
y 2 = x 3 + b and optimize them using the fact that ys, yu and xjj are in a proper 
subfield of F p k, while xs is not. Our results are significantly faster than other 
studies of this case, but nevertheless the cases with even embedding degree offer 
more advantages. For curves of the form y 2 = x 3 + b, Lin et al. [31] observed 
that 1 /udbl([/) (5) can be written as 

1 _ 1 _ X S + X SX[2]U + X [ 2 ]U 

vdbl(u)(S) x s — x \ 2 \u (ys-y[ 2 ]u)(ys + y[ 2 ]u)' 

Since (ys~y[2\u)(ys + y[2]u) lies in a subfield, the line function can be multiplied 
by x % + x S x [2]u + x [ 2 ]u > instead of dividing it by Analogously, the 

addition step becomes f ' ADD ( UtR )(<?) = Iadt>(u,r)(S) • 0 | + x s x u+r + x u+r)- 
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Point doubling and line computation. In projective coordinates Xu = 
Xi/Z\ and yu = Y 3 /Z\, we replace Xf = Y 2 Z\ — bZf and factor /d B L( t/)(^) 
see that / BBL (r/) (S) equals 

a- (X 1 Z 1 (Y 1 2 - 9 bZl) ■ x s + (4F 2 Z 2 ) ■ x% - ( 6 X^ZJ ■ y s + X 2 (Y 2 + 9 bZ 2 )) , 

where a = (18 bY?Z 2 - 27 b 2 Zf + Yf + 8 Yf% • y s )/(32YfZ?) € F pk/3 does not 
contain xs and can be discarded. The values for X\ and Z\ are defined over sub¬ 
fields of Fpfc and we obtain more efficient formulas by computing = 

/DBL(U)(^)"W/(^l a ) as 

fuBHU) (S) = Xl (lj 2 - 96Z 2 )-x s + 4Xi Y? Zi • x 2 s - GX^Yi ■ y s + (Y? - bZ \) (Y? + 9bZf ). 

For cubic twists, the term x'g £ F p fc appears in the simplified doubling line 
function so we write fn B L(u) (S) = L lfi • xs + T 2 ,o • x 2 s + i 0 ,i • ys + T 0 ,o ■ We 
compute (Xs : Ys : Z 3 ) = [ 2 ] (Xi : Y\ : Z\) and the necessary T t j coefficients 
using 6 m + 7s + ldb in addition to the multiplications by xs, Xg, and ys- 

A = Xf, B = Y 2 , C = Z\, D = bC, E=3D , F = (X- l + Y ± ) 2 -A-B, 

G = (Y 1 + Zi ) 2 -B-C, H = 3E, X 3 = F ■ (B - H), 

Y 3 = (B + H ) 2 - 3(2T) 2 , Z 3 =4B- G, L lfi = A-(B-H), L 2 ,o = F ■ G, 
L 0 ,i = -3A ■ F, L 0 ,o = (B - D) ■ (B + H). 

Note that the formulas in !33j require 8 m+9s + ldb in addition to the multiplica¬ 
tions by xs,Xg, ys, y($, xsys, and x 2 s ys, i.e. they need 6 multiplications costing 
/c/3 base field multiplications each while we only need 3 such multiplications. 
This means that the overall saving is 2m + 2s + fcmi. 

Addition and line computation. For additions we break with the conven¬ 
tional wisdom that the line function should be given in terms of the base point. 
For even embedding degrees where denominator elimination does not require 
further adjustment, that approach is suitable and particularly helps if the base 
point is given in affine coordinates. For the curves in this section we show that 
building the line function on the resulting point (X 3 : Yj : Z 3 ) gives better 
operation counts in spite of Z 3 not being equal to 1 . 

The default line function is given by • (aq — xs) +ys — yij /(x 3 — 

xs)- Using the above denominator elimination technique this gets transformed 
to 

( (xl - Si ’ ^ Tl ~ XS ^ + ys ~~ yi ) ' ( x 3 + X 3 X S + x^) /(2/3 - y%). 

This approach leads to a polynomial of the form T 2j o • x 2 s + L po • xs + Ti,i • 
xsys + L 2 ,1 • x 2 s ys + To ,2 ■ ys + To, 1 ■ ys + To,o which requires (6fe/3)mi after 
the computation of the coefficients Ljj. 

In the representation 

( (xi - xa) ’ x3 ~~ Xs ' ) + ys + 2/3 ) ' H + X3Xs + x s) / (vl - y 2 s ) 
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using the coordinates x 3 ,y 3 instead of xi,yi, it becomes obvious that the factor 
(23 — xs)(x 3 + x 3 xs + x 2 s ) = 2/3 — yg appears in the left term of the numerator 
and that thus the whole numerator is divisible by the subfield element ys + y 3 - 
(Note the sign change on y 3 because the line goes through ( 23 , —y 3 ) by the 
geometric addition law on E.) This means that the line function is of the form 
L 2 ,o-x‘g+Li t o-xs+Lo i-yg+Lo t o, requiring only (3fc/3)mi after the computation 
of the coefficients L t j. 

We obtain in projective coordinates that /add(c/ r)(^) ec l ua l s 

(ViZ 2 - Y 2 Z 1 )Z 3 (Y 3 - y s Z 3 ) + (X 2 + X 3 Z 3 x s + Z 2 x 2 g)(X 3 Z 2 - X 2 Z x ) 
(Y 3 -y s Z 3 )(X 1 Z 2 -X 2 Z 1 )Z 3 

The denominator can be discarded. To compute the numerator more efficiently 
we observe that Z 3 = Z\ Z 2 {X\ Z 2 — X 2 Z\f' so that we can divide by ( X\Z 2 — 
X 2 Z \); furthermore we scale the function by 2 to allow an s-m tradeoff. This 
gives 

fADD(u,R)(S) = 2Zlx 2 s + 2X 3 Z 3 x s -2Z 1 Z 2 (X 1 Z 2 - X 2 Z 1 ) 2 (F 1 Z 2 -r 2 Z 1 )Z 3 ys 
+ 2X\ + 2Z 1 Z 2 {X 1 Z 2 - X 2 Z 1 ) 2 {Y 1 Z 2 - Y 2 Z 1 )Y 3 . 

We compute the addition and line computation using the following sequence 
of operations. 

A = Xl • Z 2 , B = Y 1 Z 2 , C=Z 1 Z 2 , D=Z 1 X 2 -A, E = B- Z x - y 2 , 
F= D 2 , G = E 2 , H = —D ■ F, I = F ■ A, J=H + CG-2I, 

K = C ■ F ■ E; X 3 = -D-J, Y 3 = E ■ (I - J) - (H ■ B), Z 3 = C ■ H, 

L = Xl, M = Z 2 , N = (X 3 + Z 3 ) 2 - L-M, L 2fi = 2M, L 1>0 = N, 

Lq,o = 2{L + K ■ y 3 ), £ 0,1 = -2K ■ Z 3 . 

The explicit formulas for computing (X 3 : Y 3 : Z 3 ) are the same as in the 
EFD [TO]; they use 12m + 2s and use the intermediate variables A,..., J; the 
values K,. ,.,N are used in the computation of the line function. The total 
operation count for the above sequence of operations is 16m + 5s in addition to 
the multiplications by Xg,xs, and yg. Mixed addition is cheaper saving one m 
in each of A, B, and C and needing only 13m + 5s. 

In the pairing computation each addition is followed by a doubling. Thus 
L = X 2 and M = Z 2 should be cached and used in the doubling computation. 
This reuse reduces the effective costs of the addition step by 2s and similarly for 
the mixed-addition step. Accordingly we report 16m + 3s and 13m + 3s in the 
comparison in Section 0 

7 Comparisons 

This section compares the speed of our pairing formulas with the literature in 
the following categories: 
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— (i) Curves of the form y 2 = x 3 + ax have twists of degree d = 2 and 4. We 
compare operation counts with the results given by Ionica and Joux (27 1 and 
Arene et al. | 2 j ; note that those papers cover general Weierstrass curves but 
we are not aware of any other study covering this case. 

— (ii) Curves of the form y 2 = x 3 + c 2 have a point of order 3 and admit twists 
of degrees d = 2 and 6 . These curves were studied in detail very recently by 
Costello et al. in pT5] and we only found faster mixed addition formulas than 
those originally proposed. 

— (iii) Curves of the form y 2 = x 3 + b do not necessarily have a point of order 
3. We study operation counts for twists of degree 2 and 6 . These curves cover 
in particular BN curves [5]. We compare our new formulas with those given 
for the same curve shape in [ 2 ]. 

— (iv) Curves of the form y 2 = x 3 + b also have twists of degree 3. This case 
requires very different optimizations and has not been studied much in the 
literature. The first paper studying pairing computation on curves admitting 
cubic twists [31] did not pay close attention to the operation count itself, so 
we compare our formulas with the results presented by El Mrabet et al. in 
[331 . although that paper did not present addition formulas. 

The above papers for even d give fcmi for evaluating the line function. This 
can almost always be done in (2fc/d)mi, so we adjust their results accordingly. 
In the general addition case, Table [3] only gives counts for the Tate pairing. For 
d = 2 it is possible to save lm in each ADD and each mADD. For the ate pairing 
in this case the costs are different and the operation counts should be modified 
by — (2k/d)m.i + 2m. 

For mixed additions we use our improved precomputations, assuming that 
one of the input points is fixed. 


Table 3. Comparisons of our pairing formulas with the previous fastest formulas 


Curve 
Curve order 
Twist deg. 

Best 

Coord. 

DBL 

ADD 

mADD 

Prev. 

best 

Coord. 

DBL 

ADD 

mADD 

y = x + ax 

Sec. 0 

(2fc/d)mi + 2m + 8s + ld a 

I2Z), 

(2fc/d)mi + lm + 11s + ld a 

- 

W(i, 2 ) 

(2fc/d)mi + 12m + 7s 

[2! 

(2fc/d)mi + 10m + 6s 

CN 

II 

^3 


(2fc/d)mi + 9m + 5s 

J 

(2fc/d)mi + 7m + 6s 

y = X + C 

Sec. |5] 

(2fc/d)mi + 3m + 5s 

[13] 

(2fc/d)mi + 3m + 5s 

3|#£ 

& m 

(2fc/d)mi + 14m + 2s + ld c 

V 

(2fc/d)mi + 14m + 2s + ld c 

d = 2,6 

V 

(2fc/d)mi + 10m + 2s + ld c 


(2fc/d)mi + 11m + 2s + ld c 

y* = x 3 + b 

Sec. [5] 

(2fc/d)mi + 2m + 7s + ldf, 

m 

(2fc/d)mi + 3m + 8s 

3 \#E 

& [13J 

(2fc/d)mi + 14m + 2s 

j 

(2fc/d)mi + 10m + 6s 

d = 2,6 

V 

(2fc/d)mi + 10m + 2s 


(2fc/d)mi + 7m + 6s 

= x J + b 

Sec. E 

km± + 6m + 7s + Id;, 

ESI 

2fcmi + 8m + 9s + Id;, 

- 

V 

fcmi + 16m + 3s 

V 

ADD/mADD 

d = 3 


fcmi + 13m + 3s 


not reported 
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We point out that all new doublings are faster than the previous ones. In 
(i) and (iii) this comes at the expense of somewhat slower additions. In the 
Miller loop, doublings are significantly more frequent than additions so that this 
disadvantage is amply mitigated by the faster doublings. Note that the doublings 
save entire field operations, and do not just present s-m tradeoffs. 

In Table 5] we determine the operation counts for both the Tate and ate 
pairings in a typical iteration of Miller’s algorithm, based on the fastest operation 
counts summarized in Table |3l In optimized pairing implementations, the loop 
parameter is chosen to have a low Hamming weight so that only few additions 
are encountered throughout the loop. Thus, the operation counts presented in 
Tableware for the doubling stage of Miller’s algorithm. The column titled Tate 
gives the equivalent number of total base field operations (multiplications and 
squarings in F p ) for a Miller iteration, based on the fact that the first argument is 
R G E(F P ) and the second argument is S G E(F p k); for the fields of the individual 
coordinates see Table [T| The column titled ate gives the equivalent number of 
base field operations for an iteration where the first argument is R G E'( F p e) and 
the second argument is S G E'(F p k). If s = 2 i 3 J , then we can quantify the cost 
of a multiplication in the field F p s as 3 Z 5 J multiplications in F p using Karatsuba 
and/or Toom-Cook multiplication, and we do the same for squarings, cf. [29] for 
details. To compare across operations we follow the EFD [TU] and report two sets 
of numbers: the first ones are assuming that Is = lm and the second ones are 
assuming that Is = 0.8m. In the second case, we assume that squarings in F p k 
do not make use of special properties of the field extension. Thus we approximate 
the ratio of squaring to multiplication costs to be 0.8 as well. In both cases we 
assume multiplications by curve constants to be virtually free. 

We use the optimal methods of curve construction for each embedding degree, 
which were originally presented in [l8j . to determine which categories ((i)-(iv)) 
E and E' belong to. We note that constructions 6.11-6.14 in jT5] are due to [ZB]. 


Table 4. Comparison of optimal ate pairing and twisted ate pairing 


k 

Const. 

m 

¥>(*0 

P 

d 

E 

E' 

mopt : T e : r 
(log) 

Tate : ate 

s = m 

Tate : ate 
s = 0.8m 

a m op t 

vs. r}T e 

4 

6.4 

2 

2.000 

4 

(i) 

(i) 

1 :1:2 

30 : 30 

26.6 : 26.6 

Even 

6 

6.6 

2 

2.000 

6 

(h) 

(iii) 

1 :1:2 

40 : 41 

36 : 36.6 

VT e (1.02) 

8 

6.10 

4 

1.500 

4 

(i) 

(i) 

3:3:4 

68 : 88 

61 : 77.8 

VT e 

(1.3) 

9 

6.6 

6 

1.333 

3 

(iv) 

(iv) 

1:3:6 

72 : 124 

65.6 : 112 

am opt 

(1.7) 

12 

6.8 

4 

1.000 

6 

(iii) 

(iii) 

1:2:4 

103 : 121 

92.6 : 107.8 

am opt 

(1.7) 

16 

6.11 

8 

1.250 

4 

(i) 

(i) 

1:4:8 

180 : 260 

162.2 : 229.4 

am opt 

(2.8) 

18 

6.12 

6 

1.333 

6 

(iii) 

(h) 

1:3:6 

165 : 196 

148.6 : 176 

am opt 

(2.5) 

24 

6.6 

8 

1.250 

6 

(h) 

(iii) 

1:4:8 

286 : 359 

258 : 319.4 

am opt 

(3.2) 

27 

6.6 

18 

1.111 

3 

(iv) 

(iv) 

1 : 9 : 18 

290 : 602 

263.6 : 542 

am opt 

(4.4) 

32 

6.13 

16 

1.125 

4 

(i) 

(i) 

1 : 8 : 16 

512 : 772 

461.8 : 680.2 

a m 0 pt 

(5.3) 

36 

6.14 

12 

1.167 

6 

(iii) 

(iii) 

1 : 6 : 12 

471 : 597 

424.6 : 531 

a m op t 

(4.7) 

48 

6.6 

16 

1.125 

6 

(h) 

(iii) 

1 : 8 : 16 

834 : 1069 

752 : 950.2 


(6.2) 
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The construction of BN curves for k = 12 was given in [S] and construction 6.10 
for k = 8 curves is due to [H]. For each embedding degree, we also present the 
loop length ratios m op t : T e : r, where m op t is the loop parameter of the optimal 
ate pairing, T e is the loop parameter of the twisted ate pairing and r is the loop 
parameter of the standard Tate pairing. For all construction methods shown in 
Table ! there is an optimal ate pairing achieving the minimal loop length in 
Miller’s algorithm. For the twisted ate pairing we used the shortest loop length 
found by considering the powers of (t — l) e mod r. In the last column, we 
compare the optimal ate pairing and twisted ate pairing and present a factor 
that approximates how many times faster the computation of the Miller loop is 
under the faster pairing option. 
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Abstract. This paper considers a generalized form for Hessian curves. 
The family of generalized Hessian curves covers more isomorphism classes 
of elliptic curves. Over a finite field F ? , it is shown to be equivalent to 
the family of elliptic curves with a torsion subgroup isomorphic to Z/3Z. 

This paper provides efficient unified addition formulas for general¬ 
ized Hessian curves. The formulas even feature completeness for suitably 
chosen parameters. 

This paper also presents extremely fast addition formulas for gener¬ 
alized binary Hessian curves. The fastest projective addition formulas 
require 9M + 3S, where M is the cost of a field multiplication and S is 
the cost of a field squaring. Moreover, very fast differential addition and 
doubling formulas are provided that need only 5M + 4S when the curve 
is chosen with small curve parameters. 

Keywords: Elliptic curves, Hessian curves, cryptography. 


1 Introduction 

An elliptic curve E over a field F can be given by the Weierstraft equation 

E : Y 2 + an XY + a 3 Y = X 3 + a 2 X 2 + a 4 X + a 6 , 

where the coefficients aq, a 2 , a 3 , a 4 , ae G F. Koblitz [26] and Miller [30] were the 
first to show that the group of rational points on an elliptic curve E over a 
finite field F g can be used for the discrete logarithm problem in a public-key 
cryptosystem. 

There are many other ways to represent elliptic curves such as Legendre equa¬ 
tion, cubic equations, quartic equations and intersection of two quadratic surfaces 
I2I32I3KI . Several forms of elliptic curves over finite fields with several coordinate 
systems have been studied to improve the efficiency and the speed of the arithmetic 
on the group law (mainly addition and doubling formulas) |2I4| . 


P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 243^260^2010. 
(c) International Association for Cryptologic Research 2010 
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Some unified addition formulas that also work for the point doubling have 
been presented for several forms of elliptic curves, see e.g. paiSMftii iiirtis] . 
Overviews can be found in m- Moreover, complete addition formulas that 
work for all pairs of inputs have been presented for Edwards curves over odd 
characteristic fields [5j, and for binary Edwards curves 

A Hessian curve over a field F is defined by a symmetric cubic equation 

X 3 + Y 3 + Z 3 = dXYZ , 

where d £ F and d 3 ^ 27. The use of Hessian curves in cryptography has 
been studied in I13123I33I21I22] . The Hessian addition formulas, the so-called 
Sylvester formulas, can also be used for point doubling after a permutation of 
input coordinates, providing a weak form of unification. Moreover, the same 
formulas can be used to double, add, and subtract points, which makes Hessian 
curves interesting against side-channel attacks [25] , 

In this paper, we consider the family of curves, referred to as generalized 
Hessian curves , over a field F defined by the equation 

X 3 + E 3 + cZ 3 = dXYZ , 

where c, d £ F, c ^ 0 and d 3 ^ 27c. Clearly, this family covers more isomorphism 
classes of elliptic curves than Hessian curves. Notice that the Sylvester addition 
formulas work for the family of generalized Hessian. But these formulas are 
not unified. From the Sylvester formulas and after suitable transformation of 
inputs coordinates, we present fast and efficient unified addition formulas for 
generalized Hessian curves. 

Nevertheless, the unified formulas for Hessian curves are not complete. In 
other words, there are some exceptional cases where the formulas fail to give the 
output. We study the exceptional cases of the addition formulas for generalized 
Hessian curves. We observe that the unified formulas are complete for many 
generalized Hessian curves, i.e., the addition formulas work for all pairs of inputs. 
In particular, the group of F-rational points on a generalized Hessian curve has 
complete addition formulas if and only if c is not a cube in F. Also, the unified 
formulas are valid for all input points in rational subgroups 7 L t of generalized 
Hessian curves over finite fields F 9 whenever gcd(#7f,3) = 1. 

For generalized binary Hessian curves, the unified addition formulas are the 
fastest known addition formulas on binary elliptic curves; for example 9M + 3S 
for extended projective addition, 8M + 3S for extended mixed affine-projective 
addition, and 5M + 4S for mixed addition and doubling, when curves are chosen 
with small parameters. As usual, we use M to denote a field multiplication and 
S to denote a field squaring. Furthermore, the addition formulas are complete 
for generalized Hessian curves over F 2 ^ when c is not a cube in F 2 ". The mixed 
differential addition and doubling formulas are also complete. 

Note. In [7], Bernstein, Kohel, and Lange define the twisted Hessian form. The 
twisted form is similar to the above form up to the order of the coordinates. 
Both forms present advantages. The neutral element on the twisted form is a 
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finite point. In affine coordinates, the generalized form is fully symmetric and 
features a simpler inverse. See also [12 Exerc. 6.2]. 


2 Generalized Hessian Curves 

A Hessian curve over a field F is given by the cubic equation 
H d : x 3 + y 3 + 1 = dxy , 

for some d £ F with d 3 ^ 27 [T5]- This section considers the family of general¬ 
ized Hessian curves which cover more isomorphism classes of elliptic curves than 
Hessian curves. As will be shown, this family provides efficient unified addi¬ 
tion formulas. Moreover, the unified formulas are complete for some generalized 
Hessian curves, i.e., the addition formulas work for all pairs of inputs. 


2.1 Definition 

Definition 1. Let c, d be elements of ¥ such that c / 0 and d 3 ^ 27c. The 
generalized Hessian curve H over F is defined by the equation 

H c<d ■ x 3 + y 3 + c = dxy . 


Clearly, a Hessian curve H^ is a generalized Hessian curve Hwith c = 1. 
Moreover, the generalized Hessian curve H Cj( / over F, via the map ( x , y ) i—> ( x , y) 
defined by 

x = x/C and y = y/C (1) 

with £ 3 = c, is isomorphic over F to the Hessian curve H d : x 3 d-y 3 + 1 = ^ xy. 
Therefore, for the j-invariant of H Ci( j, we have 


j(H c ,d)= j(Hd) 


1 f d(d 3 + 6 3 c) \ 
c \ d 3 — 3 3 c ) 


( 2 ) 


We see that the curve H C] d over F is isomorphic to the curve Hd over F if £ € F. 
In other words, a generalized Hessian curve over F is isomorphic over F to a 
Hessian curve if and only if c is a cube in F. 

It is easy to adapt the addition and doubling formulas for generalized Hes¬ 
sian curves (see e.g. [121 Formulary], a.k.a. Sylvester formulas). The sum of two 
(different) points (xi,j/i), (^ 2 , 2 / 2 ) on H c ^ is the point (^ 3 , 2 / 3 ) given by 


£3 = 


Vi x 2 


V 2 xi a-’i 2/2 

and 2/3 = 


x 2 yi 


£ 22/2 - 2h2/i x 2 y2 - xiyi 

The doubling of the point (aq, 2/1) on H Ct d is the point (X3,2/3) given by 

2/i (c — X\ 3 ) x\{c — 2/1 3 ) 

X 3 = - 3 -o- and 2/3 = - 3 - 3 - ■ 

X \ 3 - 2/1 3 xi 3 - 2/1 3 

Furthermore, the inverse of the point (xi, 2 /i) on H C} d is the point (2/1, X\). 


(3) 


( 4 ) 
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The projective closure of the curve H Ci d is 

H Cjd : X 3 + Y 3 + cZ 3 = dXYZ . 

It has the points (1 : — ui : 0) with w 3 = 1 at infinity. The neutral element 
of the group of F-rational points of H c ,d is the point at infinity (1 : —1 : 0) 
that we denote by O. For the point P = (Xi : Yj : Z\) on H C)C j, we have 
-P = (Yi : X\ : Z x ). 


Point addition. Using the addition formulas ([5]), whenever defined, the sum of 
the points (Xi : Y x : Z\), (X 2 : Y 2 : Z 2 ) on H Cj d is the point (X 3 : F 3 : Z 3 ) with 

X 3 = X 2 Z 2 Y 1 2 - X 1 Z 1 Y 2 2 , f 3 = Y 2 Z 2 X 1 2 - Y 1 Z 1 X 2 2 , 

Z 3 = X 2 Y 2 Z x 2 - X 1 Y 1 Z 2 2 . (5) 

The cost of point addition algorithms in [13123155] is 12M. Moreover, these 
addition formulas can be performed in a parallel way, see [55] . In particular, one 
can perform the addition formulas ([5]) in a parallel environment using 3,4 or 6 
processors with the cost of 4M, 3M or 2M, respectively. To gain speedup, one 
can use the extended coordinates (X : Y : Z : X 2 : Y 2 : Z 2 : 2XY : 2XZ : 
2YZ). The addition algorithm in [22 uses this modified system of coordinates 
for the Hessian curves over the field F of characteristic p > 3. This algorithm 
requires 6M + 6S. 

Point doubling. The doubling of the point (Xi :Y\\Z 1 ) on H C)C ; is the point 
(X 3 : Y 3 : Z 3 ) given by 

X 3 = Y\{cZ 3 — X, 3 ), Y 3 = Xi(Yi 3 — cZi 3 ), Z 3 = Z 1 (X 1 3 -Y 1 3 ) . (6) 

From the doubling algorithm in [15] , we have the following algorithm that needs 
6M + 3S + ID, where D is the cost of a multiplication by the constant c: 

A = X 1 2 , B = Y 2 , C = Z 2 , D = Xi A, E = Y { B, F = cZ x C , 

X 3 = Y 1 (F-D), Y 3 = X x {E-F ), Z 3 = Z 1 (D-E) . (7) 

Moreover, the cost of the following doubling algorithm for curves H C)( j over a 
field F of characteristic p ^ 2 is 7M + IS + ID: 

A = XiYi, B={X 1 + Y ) 2 -2 A, C= (X x + Y X )(B - A), 

D={X 1 - Y±)(B + A), E = 3C — 2 dAZ x , 

X 3 = Yi(E + D), Y 3 = X 1 {D-E), Z 3 = -2Z x D . ( 8 ) 

Also, one can perform the doubling formulas (O with a cost of 3M + 3C + ID, 
where C denotes a field cubing. Furthermore, for Hessian curves Hi j£ ; over the 
field F of characteristic p / 2, the doubling algorithms in [21122] use the extended 
coordinates which require 3M + 6S. 
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2.2 Universality of the Model 

We study the correspondence between generalized Hessian curves and elliptic 
curves having a torsion subgroup isomorphic to Z/3Z. In particular, we show 
that every elliptic curve over a finite field with a torsion subgroup isomorphic to 
Z/3Z has an isomorphic generalized Hessian model. 

Theorem 1. Let E be an elliptic curve over a field F. If the group E(F) has 
a point of order 3 then E is isomorphic over F to a generalized Hessian curve. 
Moreover, if F has an element to with to 2 + lo + 1 = 0, then the group E{¥) has 
a point of order 3 if and only if E is isomorphic over F to a generalized Hessian 
curve. 

Proof. We note that the elliptic curve E over F has a point of order 3 if and only 
if it has a Weierstrafi model E ai , a3 : y 2 z + a\xyz + a^yz 2 = x 3 (see e.g. [25]). 
Let ugF with w 2 + u> + 1 = 0. Let p be the characteristic of F. 

1. If p ^ 3, the elliptic curve E QljQ3 via the map (a;, y, z) h-> (. X , Y, Z) given by 

X = oja\X + (lo — 1 )y + (2 u> + 1 ) 032 , 

Y = — (u> + l)aix — (w + 2 )y — (2 u> + 1 ) 032 , Z — x 

is isomorphic over F(w) to the generalized Hessian curve fl c ,d with c = 
ai 3 — 27a3 and d = 3ai. On the other hand, the generalized Hessian curve 
H c , d is isomorphic over F(oj) to the Weierstrafi curve E aijQ3 with ai = d/3, 
a 3 = (d 3 - 27c)/3 6 . 

2. If p = 3, the elliptic curve E QljQ3 via the map ( x , y, z ) 1 —> (X, Y, Z) given by 

X = —a 3 2 2 , Y = 03 ( 012 ; + y + 032 ), Z = —y 

is isomorphic over F to the generalized Hessian curve H C)C / with c = 03 s and 
d = ai 3 . Conversely, every generalized Hessian curve H Cj d is isomorphic over 
F to the Weierstrafi curve E aijQ3 with ai = \fd, 03 = yfc. □ 

Remark 1. Consider the elliptic curve E aiia3 defined in the proof of Theorem |l] 
If p ^ 3 and ai 3 — 2703 is a cube in F, we let c = 1 and d = 3(ai + 2<S)/(oi — <5), 
where 6 3 = ai 3 — 27o3. Then, the map (x, y, 2 ) 1 —> (X, Y, Z) given by 

X = (2ai + S)x + 3 y + 3032 , Y = — (oi — S)x — 3 y, Z = — (ai — S)x — 3asz 

is an isomorphism over F between E ai ,a 3 and 

Theorem 2. Let E be an elliptic curve over a finite field F g . Then, the group 
E(¥ q ) has a point of order 3 if and only if E is isomorphic over F 9 to a gener¬ 
alized Hessian curve. 

Proof. If q = 0,1 (mod 3) then the theorem is a direct consequence of Theorem|Tj 
Next, we assume that q = 2 (mod 3). So, every element of F g is a cube. If 
the elliptic curve E has an F g -rational point of order 3 then Remark [l] provides 
an isomorphism between E and a generalized Hessian curve. Moreover, every 
generalized Hessian curve H c ,d over F g has the point (—( : 0 : 1) of order 3, 
where ( 3 = c (see Section |4j) . □ 


248 


R.R. Farashahi and M. Joye 


3 Unified Addition Formulas 

Let H Ct d be a generalized Hessian curve over F. We recall that the addition 
formulas (0 do not work to double a point. Hereafter, we give some unified ad¬ 
dition formulas for H c ,d where the doubling formulas can be derived directly from 
the addition formulas. The unified addition formulas make generalized Hessian 
curves interesting against side-channel attacks |2II9| . 

Let Pi = (Xi : Y\ : Z X ) and P 2 = (X 2 : Y 2 : Z 2 ) be two points of H Cj d(F). 
Let also T = (—£ : 0 : 1) £ H Ci< i(F) with £ 3 = c. Letting Qi = Pi + T and 
Q -2 = P 2 - T, we have Qi = ((Yi : ( 2 Z X : Xi) and Q 2 = (( 2 Z 2 : (X 2 : Y 2 ). 
Clearly, Pi+P 2 = Q 1 + Q 2 ■ To compute Pi + P 2 , we use the addition formulas 0 
with inputs Q 1 and Q 2 . Doing so, we see that the sum of the points (X x : Yi : Z x ) 
and (X -2 : Y 2 : Z 2 ) on H C) d is the point (X 3 : Y 3 : Z 3 ) given by 

X 3 = cY 2 Z 2 Zi 2 - XiY x X 2 2 , F 3 = X 2 Y 2 Yi 2 - cXiZ x Z 2 2 , 

Z^=X 2 Z 2 Xi 2 -YiZiY 2 2 . (9) 

These formulas work for doubling, i.e., they are unified addition formulas. We 
note that, by the swapping the order of the points in the addition formulas 0, 
one can obtain the following unified formulas: 

X 3 = cY x ZiZ 2 2 - X 2 Y 2 Xi 2 , F 3 = X x YiY 2 2 - cX 2 Z 2 Zi 2 , 

Z 3 = X x ZiX 2 2 -Y 2 Z 2 Yi 2 . (10) 

The next algorithm evaluates the addition formulas 0 with 12M + ID, where 
ID denotes the multiplication by constant c, which may be chosen small: 

A = XiX 2 , B = YiY 2 , C = cZiZ 2l D = X x Z 2 , E = Y x X 2 , F = Z X Y 2 , 

X 3 = CF- AE, Y 3 = BE- CD, Z 3 = AD-BF . (11) 

It turns out that a mixed addition requires 10M + ID by setting Z 2 = 1. 
Moreover, the addition formulas 0 can be performed in a parallel way, similarly 
to the algorithm proposed for the addition formulas 0 in [33]. 

When F is of characteristic p / 2, one can use the modified system of coor¬ 
dinates presented in [22] §2.4]. Applying it to addition formulas 0, the sum of 
two points on H Cj d represented by : Yi : Z x : A\ : B x : C\ : D x : E x : F x ) 
and (X 2 : Y 2 : Z 2 : A 2 : B 2 : C 2 : D 2 : E 2 : F 2 ) with 

Ai = Xi 2 , Bi = Yi 2 , Ci = Zi 2 , Di = 2X x Yi, E x = 2X x Z x , F x = 2Y X Z U 

A 2 = X 2 2 , B 2 = Y 2 2 , C 2 = Z 2 2 , D 2 = 2X 2 Y 2 , E 2 = 2X 2 Z 2 , F 2 = 2Y 2 Z 2 , 

is the point represented by (X 3 : Y 3 : Z 3 : A 3 : B 3 : C 3 : D 3 : E 3 : F 3 ) given by 

X 3 = cC x F 2 — D x A 2 , Y 3 = B x D 2 — cE x C 2 , Z 3 = A X E 2 — F X B 2 , 

A 3 = A 3 2 , B 3 = Y 3 2 , C 3 = Z 3 2 , D 3 = (X 3 + Y 3 ) 2 - A 3 - B 3 , (12) 

e 3 = (X 3 + Z 3 )~ — A 3 — c 3 , f 3 = (Y 3 + Z 3 ) 2 — B 3 — c 3 . 
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This algorithm requires 6 M + 6 S + 2D, where 2D represent the two multiplica¬ 
tions by constant c, which can be chosen small. Furthermore, the mixed addition 
formulas can be obtained by setting Z 2 — 1 which need 5M + 6 S + 2D. 

4 Complete Addition Formulas 

Again, we let Hdenote a generalized Hessian curve over F. In this section, we 
study the exceptional cases of the addition formulas ©, © and (fTUl) . In par¬ 
ticular, we show that addition formulas ©, (TlTTlf work for all pairs of F-rational 
points on H Cj d whenever c is not a cube in F. 

We consider the set of F-rational points at infinity on H Ci <j, denoted by oo, 
oo = {(1 : — u> : 0) | u £ F, w 3 = l} . 

We note that oo is a subgroup of the group of F-rational points on H Ci d. Further, 
oo is a subgroup of the 3-torsion group H C)t j[3], where 

H c , d [3] = {P\Pg H c , d (F), 3 P = 0) . 

Let 71, 72 be the set of F-rational points P = (X : Y : Z) of H Cj d[3] with Y = 0, 
X = 0, respectively. Namely, 

71 = {(-<: 0:1) ICGF, C 3 = c} and T 2 = {-P \ P e 71} . 

Clearly, H Cj( j[ 3] is partitioned into oo U T\ UT 2 . 

The following proposition describes the exceptional cases of the addition for¬ 
mulas ©. 

Proposition 1. The addition formulas © work for all pairs of points Pi,P 2 
on H c , d if and only if Pi — P 2 is not a point at infinity. 

Proof. Let Pi = {X 3 : Y\ : Z\) and P 2 = (X 2 : Y 2 : Z 2 ) be points in H Ci d(F). 

First, assume that the addition formulas © do not work for the inputs Pl, 
P 2 , i.e., we have X 3 — Y 3 = Z 3 = 0, where X$ = X 2 Z 2 \p — XiZiY 2 2 7 Y$ 

) Z X — YiZiX 2 2 and Z 3 = X 2 Y 2 Zi 2 — X{YiZ 2 2 . We distinguish two cases 
to show that P\ — P 2 £ oo. 

1. If Z\ = 0 then Z 3 = — XiY\Z 2 2 . We see that X\Y\ 7 ^ 0, since Pi € H c ,d- 
So, Z 2 = 0. That means Pl,P 2 are in 00 . Therefore, P\ — P 2 is a point at 
infinity. 

2. Assume now that Z\ / 0 and Z 2 7 ^ 0. We write Pi = (aq : yi : 1) and 
P 2 = (a ; 2 '■ 2/2 : 1), where aq = Xi/Zi and t/i = Yi/Zi (i = 1,2). From 
x 3 = Y 3 = Z 3 = 0, we have x 2 yi 2 = Xiy 2 2 , y 2 xi 2 = yix 2 2 and aqiq = x 2 y 2 . 
So, yiy 2 (xi 3 —x 2 3 ) = 0 and XiX 2 {y\ 3 — y 2 3 ) = 0. Moreover, from the equation 
of H Ct d, we have aq 3 + y\ 3 = a ’ 2 3 + y 2 3 . 

If aqa : 2 7 ^ 0 then y\ 3 = y 2 3 . Next, we assume that aqaq = 0. If aq = 0 
then y\ 7 ^ 0. From X 3 = 0, we remark that x 2 = 0. Then, aq = a ; 2 = 0 
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implies that yi 3 = y 2 3 . Therefore, in all cases, we obtain yi 3 = y 2 3 and 
X\ = X 2 3 - So, we can write y 2 = uJiyi and X 2 = u> 2 Xi, where uj i,u >2 are third 
roots of unity. The condition x\y\ = X 2 IJ 2 becomes (u>iUJ 2 — l)aqyi = 0. If 
xiUi 7 ^ 0 then ui 2 = wi _1 and thus Pi — P 2 = (1 : —uq : 0). If x\ = 0 then 
x 2 = 0 and Pi — P 2 = (1 : —ui 1 : 0). Finally, if yi = 0 then y 2 = 0 and 
Pi — P 2 = (u >2 : —1 : 0). Summing up, we always have Pi — P 2 £ 00 . 

Now, we study the other direction. We assume that Pi — P 2 £ 00 where Pi,P 2 £ 
H c ,d(F). Then Pi = P 2 + (1 : —u : 0) = (ul 2 : co~ 1 Y 2 : Z 2 ), where w is a third 
root of unity. It is easily seen that the addition formulas © do not work for 
such Pi,P 2 . □ 

We note that the addition formulas (0 work for all distinct pairs of F-rational 
inputs if the curve H Ci d over F has only one F-rational point at infinity, i.e., if F 
has only one third root of unity. This happens for Hessian curves H Ci d over F g 
with q^l (mod 3) and, in particular, for binary curves H Ci( j over F 2 « with odd 
integers n. 

Proposition 2. The addition formulas © work for all pairs of points Pl,P 2 
on H C! d if and only if Pi — P 2 ^.Ti- 

Proof. Let Pi, P 2 be points on H Ci( j. Let Ti be a point of 7j. Let Qi = Pi + Tj 
and Q 2 = P 2 — Ti. We note that the output of formulas © for the pair of points 
Pi, P 2 is equal to the output of formulas © for the pair of points Q±, Q 2 - From 
Proposition [T| we see that the formulas © do not work for the pair of points 
Pi, P 2 if and only if Qi — Q 2 £ 00 . This is equivalent to Pi — P 2 £ Ti- □ 

Similarly, the addition formulas (1TU1) work for all pairs of points Pl, P 2 on H Cj d 
with Pi — P 2 ^ T 2 . Since the sets T\ and 7-j are disjoint, if the addition formu¬ 
las © fail to compute the sum of two points, then the addition formulas (II 011 
work to compute this sum. Clearly, this is true for the other way round. In other 
words, if the addition formulas © do not work for the pair of inputs Pi,P 2 , 
then they work for the pair of inputs P 2 , Pl . 

Corollary 1. The doubling formulas © for the generalized Hessian curve H c>c j 
work for all inputs. 

Proof. The doubling formulas © can be obtained from the addition formulas © 
by letting P 2 = Pi- Then, from Proposition [2] we see that these doubling for¬ 
mulas work for all points on H Ci( ;. □ 

Corollary 2. Assume TI is a subgroup of H CiC ;(F) which is disjoint from Ti. 
Then, the addition formulas © and (1101) work for all pairs of points in TL. 

Proof. Clearly, TI and T 2 are disjoint as well. Then, Proposition [5] concludes the 
proof. □ 

Here, we express the family of complete generalized Hessian curves. By a complete 
curve, we mean a curve with complete addition formulas, i.e., a curve over a field 
F with addition formulas that are valid for every pair of F-rational points. 
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Theorem 3. Let c, d be elements of F such that d 3 ^ Tic. Let H c ,d be the 
generalized Hessian curve over F with the addition formulas ©. Then, H c ,d is 
complete over F if and only if c is not a cube in F. 

Proof. By definition of T \, we see that the set of F-rational points of T\ is empty 
if and only if c is not a cube in F. By Proposition [2] the addition formulas © 
work for all pairs of F-rational points if and only if the set of F-rational points 
of T\ is empty, which completes the proof. □ 

Below, we give two examples of generalized Hessian curves over finite fields with 
complete addition formulas. 

Example 1. Let c, d be elements of the finite field F g with q = 1 (mod 3) such 
that d 3 7 ^ 27c and c is not a cube in F g . Then, the generalized Hessian curve 
H c ,d over F g is complete with the addition formulas © or (HOI) . 

Example 2. Let c, d be elements of F g such that c / 0 and d 3 Tic. Let TL be 
a subgroup of H C]t z(F g ) with gcd(#77, 3) = 1. Then, H is complete over F g with 
the addition formulas © or (HUl) . 

5 Explicit Formulas in Characteristic 2 

In this section, we present fast and efficient addition, doubling, tripling and 
differential addition formulas for generalized binary Hessian curves over a field 
F of characteristic p = 2. 

5.1 Addition 

We recall that the cost of point addition algorithms in [13I33| for the addition 
formulas © is 12M. Also, the addition algorithm ( 1111 ) requires 12M + ID. 
One may choose the constant c small to reduce the cost of this algorithm to 
12M. Further, the addition algorithm m is unified. Furthermore, it features 
completeness for generalized binary Hessian curve Hover F 2 n, where n is even 
and c is not a cube in F 2 * 1 . 

Moreover, one can use the extended coordinates (A : Y : Z : X 2 : Y 2 : Z 2 : 
XY : XZ : YZ). Here, the sum of two points on H C} d represented by (Xi : Y\ : 

Z\ : A\ : B\ : C\ : D\ : E\ : F±) and (X 2 '. Y 2 : Z 2 ■ A 2 . B 2 '■ C 2 : D 2 '■ E 2 ■ Ff) 

where 

Ai = X 1 2 , B 1 = Y 1 2 , Ci = Zi , Di = X-iY, E 1 = X x Z x , F 1 = Y 1 Z 1 , 

A 2 = X 2 2 , B 2 = y 2 2 , C 2 = Z 2 2 , D 2 = X 2 Y 2 , E 2 = X 2 Z 2 , F 2 = Y 2 Z 2 

is the point represented by (X 3 : Y 3 : Z 3 : A 3 : B 3 : C 3 : D 3 : E 3 : F 3 ) given by 

A 3 = cC\F 2 + D\A 2 , L 3 = B\D 2 + cE\C 2 , Z 3 = AiE 2 + F\B 2 , 

A 3 = A 3 2 , B 3 = Y 3 2 , C 3 = Z 3 2 , D 3 = AgYg, E 3 = X 3 Z 3 , F 3 = Y 3 Z 3 . [ 
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This algorithm requires 9M + 3S + 2D. where the two D are multiplication by 
the constant c. We note that the algorithm m is obtained from the addition 
formulas (|9|) , so it is unified and works for point doublings as well. Moreover, it 
works for all pairs of inputs on a complete curve (cf. Theorem [3]). Furthermore, 
the mixed addition formulas need 8M + 3S + 2D by setting Z 2 = 1. If c is small, 
then one can obtain the addition algorithm in a parallel environment using 3,4 
or 6 processors which needs 3M + IS, 3M or 2M, respectively. 

Table|T|lists the complexities of addition formulas for different shapes of binary 
elliptic curves and different coordinate systems. As Table[l]shows, the generalized 
Hessian curves provide the fastest addition formulas for binary elliptic curves. 
Moreover, our formulas for Hessian curves are unified. They are even complete 
for many generalized Hessian curves. We note that all addition formulas for 
short Weierstrafi curve are not even unified. But, binary Edwards curves provide 
unified and even complete formulas. 


Table 1. Cost of addition formulas for different families of binary elliptic curves 


Curve shape 

Representation 

Projective 

addition 

Mixed 

addition 

Short Weierstrafi 

y 2 + xy = x 3 + CL 2 X 2 + a& 

Projective [4] 

Jacobian [1] 

Lonez-Dahab 1 4120 
Extended Lopez-Dahab 
with ao. = 0 1 1 1112( 11 

with a 2 = 1 1114120 24 

14M + IS + ID 
14M + 5S + ID 
13M + 4S 

14M + 3S 

13M + 3S 

11M + 1S + 1D 
10M + 3S + ID 
8M + 5S + ID 

9M + 4S + ID 
8M + 4S 

Binary Edwards 
d\{x + y) + d 2 (x 2 + y 2 ) 

= xy + xy(x + y) + x 2 y 2 

Projective |H] 

Projective 
with d\ = d 2 [6] 

18M + 2S + 7D 

16M + IS + 4D 

13M + 3S + 3D 

13M + 3S + 3D 

Hessian 

x 3 + y 3 + 1 = dxy 

Projective (13 23 33 
Projective, formulas (1111) 
Extended, formulas (1 1311 

12M 

12M 

9M + 3S 

10M 

10M 

8M + 3S 

Generalized Hessian 
x 3 + V 3 + c = dxy 

Projective UL2j 

Projective, formulas (1111) 
Extended, formulas (1131) 

12M 

12M+ ID 

9M + 3S + 2D 

10M 

10M + ID 

8M + 3S + 2D 


5.2 Doubling 

We recall that the doubling algorithm J7]) needs 6M + 3S + ID to perform the 
doubling formulas ©• Furthermore, from the doubling formulas ©, we see that 
the doubling of the point (X 3 : Y\ : Z\) on H C)C ; is the point (X 3 : Y 3 : Z 3 ) with 

X 3 = Y 1 A + dX 1 Y 1 2 Z 1 , Y 3 = X 1 4 + dX 1 2 Y 1 Z 1 , Z 3 = cZ^ + dX x Y x Z x 2 . (14) 

The following algorithm performs the doubling formulas (11411 which requires 
5M + 6S + 2D: 

A = X 1 2 , B = Y 2 , C = Z 1 2 , D = XiYi, G = DZ U H = dG , 

X 3 = B 2 + YAH, Y 3 = A 2 + X x H, Z 3 = cC 2 + Z X H . 
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Moreover, the doubling of the point (Xx : Y\ : Z \) on a binary curve H Cj d, 
using the representation (X\ : Y\ : Z\ : A\ : Bx : C\ : Dx : E\ : Fj), where 
Ai = Xi 2 , B 1 = Yi 2 , Ci = Yi 2 , D 1 = X{Y\, Ex = X x Z x , F x = \\Z X , is the 
point represented by (X 3 : Y 3 : Z 3 : A 3 : B 3 : C 3 : D 3 : E 3 : F 3 ) given by 

X 3 = BxiBx + dEx ), Y 3 = Ax(Ax + dFx), Z 3 = (Ax + B x + Dx)(E x + F x ), 

a 3 = x 3 2 , b 3 = y 3 2 , c 3 = z 3 2 , d 3 = x 3 y 3 , e 3 = x 3 z 3 , f 3 = y 3 z 3 . 

The cost of above doubling algorithm is 6 M + 3S + 2D. We also note that, the 
coordinates D 3 , E 3 and F 3 can be given by 

D 3 = Dx A + cdEx 2 Fx 2 , E 3 = cFx 4 + dD^Ex 2 , F 3 = cEx 4 + dDx 2 Fx 2 . 

The following doubling algorithm needs less field multiplications: 

G = Ax 2 , H = Bx 2 , I = C 1 2 , J = DxEx, K = DxFx, L = ExFx, 

X 3 = H + dK, Y 3 = G + dJ, Z 3 = cl + dL, A 3 = X 3 2 , B 3 = Y 3 2 , C 3 = Z 3 2 , 
R = Dx 2 + VcdL, S = y/cFx 2 + VdJ, T = sfcE x 2 + VdK, 

D 3 = R 2 , E 3 = S 2 , F 3 = T 2 . 

Above doubling algorithm needs 3M + 12S + 9D. This algorithm requires 3M + 
12S + 6 D if c is small and 3M + 12S + 4D if d is small. 

Our doubling formulas slightly improve the current speed of doublings on 
Hessian curves. Moreover, the doubling formulas for generalized Hessian curves 
are faster than doubling formulas using projective coordinates in short Weier- 
straB form, see [2|. But, they are slower than various doubling formulas using 
Jacobian [2], Lopez-Dahab representations of short Weierstrafi form EEnmm] 
and projective representation of binary Edwards jB]. 

We note that the only complete doubling formulas are presented by binary 
Edwards [B] and generalized binary Hessian curves (see Corollary []]). 

5.3 Tripling 

Here, we present fast tripling formulas for generalized binary Hessian curves. 
The tripling formulas can be used in double based number systems, DBNS; see 
e.g., [1413115) . For a point (Xx : Yi : Zx) on we have 3(X 3 : Yx : Z\) = 
(X 3 : Y 3 : Z 3 ) with 

X 3 = d(Yx 3 (Zx 3 + Xx 3 )(Xx 3 + Yi 3 ) + A' 1 3 (Y 1 3 + Y 1 3 )(Y 1 3 + Zx 3 )), 

Y 3 = d(Xx 3 (Yx 3 + Zx 3 )(Xx 3 + Yi 3 ) + Yx 3 (Zx 3 + X 1 3 )(Y 1 3 + Xi 3 )) , 

Y 3 = (Xx 3 + Yi 3 + Zx 3 )((Yx 3 + Zx 3 )(Z x 3 + Xx 3 ) + (X 3 3 + Yx 3 ) 2 ) . 

For generalized binary Hessian curves, we suggest the following formulas. If d 7 ^ 
0, let e = d -1 . The following algorithm computes (X 3 : Y 3 : Z 3 ) and requires 
7M + 6 S + 3D (and 7M + 6 S + 2D if either c or e is small), 
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A = Xi 3 , B = Yi 3 , C = cYi 3 , E= A 2 , F= B 2 , G = C 2 , 
H=(A + C)(F + G ), I = (B + C){E + G), J={A + B){E + F), 
K = (A + B + C){E + F + G ), L = H + I + { 1 + ce 3 )K, 

X 3 = H + J + L, Y 3 = I+J + L , Z 3 = eL . 


5.4 Differential Addition 


We now devise differential addition formulas on binary Hessian curves using 
w-coordinates, where for a point (x, y) on the binary curve H C) d, w(x, y) is defined 
by a symmetric function in terms of the coordinates x, y. 

The w-coordinates for differential addition require computing w(P + Q) given 
w(P), w(Q ) and w(P—Q); and the ^-coordinates for differential doubling require 
computing w(2P) given w(P). We recall, | 31 I 6 | . that using ie-coordinate differen¬ 
tial addition and doubling formulas, one can recursively compute w((2m + 1)P) 
and w{2mP) given w{mP) and w((m + 1 )P). 

Let (x2, 2/2) be a point on Hand let (X4, 2/4) = 2(x2,2/2)- Write Ui = Xi + yi 
and Vi = Xtyt for i = 2 , 4 . From doubling formulas (jj]), we obtain 


U 4 


U 2 4 + cd 
du 2 2 + c 


and V4 


V 2 4 + cdv 2 2 

d 2 v 2 2 + c 2 


( 15 ) 


Assume that (aq, 2/1), (£2,2/2), (£3,2/3), (£5,2/5) are affine points on B c d satisfying 
(£i,2/i) = (£3,2/3) - (£2,2/2) and (£5,2/5) = (£2,2/2) + (£3,2/3)- Write m = Xi + yi 
and Vi = Xiyi for * = 1 , 2 , 3 , 5 . Using the addition formulas ([ 3 ]), we obtain 


_ U 2 2 U 3 2 + dU 2 U 3 (u 2 + u 3 ) + d 2 u 2 u 3 

1 + 15 d('U 2 2 + U 3 2 ) + U 2 U 3 (U 2 + u 3 + d) + c ’ 

_ d'U 2 2 U 3 2 + c(** 2 2 + U 3 2 + d 2 ) 

d('U 2 2 + U 3 2 ) + U 2 U 3 {U 2 + U 3 + d) + c 

Furthermore, we have 


vi + £5 = 


(c + dv 2 ){c + dv 3 ) 


viv 5 = 


( v 2 + v 3 ) 2 

V2 2 V3 2 + cdv 2V3 + c 2 (v 2 + v 3 ) 
{v 2 + V3) 2 


( 16 ) 


Using above affine formulas one can obtain fast projective and mixed differential 
addition and doubling formulas. In order to speed up these formulas, we consider 
the following w-coordinates. We write Wi = c + dvi for * = 1 , 2 ,..., 5 . In other 
words, Wi = £,; 3 + yt 3 . Here, d ^ 0 . From (flbl) . we have 

W 2 4 + c 3 (d 3 + c) 

W4 ~ dW ‘ 

Using the formulas (TTtTT) . we obtain 


Wi + w 5 = 


d 3 W2W 3 
[w 2 + W3) 2 


and 


W!W 5 = 


w 2 2 w 3 2 + c 3 (d 3 + c) 
(w 2 + w 3 ) 2 
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To have projective formulas, we assume that Wi are given by the fractions Wi/Zi 
for i = 1,2,3. The following explicit formulas give the output w 3 defined by 
W 5 /Z 5 : 


A = W 2 Z 3 , B = W 3 Z 2 , C = AB, U = d 3 C , V = (A + B) 2 , 
Z 5 = Zi V, W 5 = Z-JJ + Wi V . 


(17) 


These formulas require 6M+1S + 1D. Furthermore, the cost of mixed differential 
addition with ^-coordinates is 4M + IS + ID by setting Z 4 = l. 

Moreover, we write w 4 by the fraction W 4 /Z 4 . Then, the explicit doubling 
formulas 


A = W 2 2 , B = Z 2 2 , C = A+ y/c 3 (d 3 + c)B, D = d 3 B, 
W 4 = C 2 , Z 4 = AD 


(18) 


use 1M + 3S + 2D. If c = 1, i.e., H cd is a Hessian curve, then the explicit 
doubling formulas use 1M + 3S + 1D: 

A = W 2 2 , B = Z 2 2 , C = A + B, D = (l/Vd?)C, no) 

w 4 = (b + D ) 2 , z 4 = ab . 1 J 

As a result, the total cost of projective w-coordinate differential addition and 
doubling is 7M + 4S + 3D. Also, the mixed w-coordinate differential addition 
and doubling formulas use 5M + 4S + 3D. For Hessian curves Hi^, the total 
costs of projective and mixed ^-coordinate differential addition and doubling are 
7M + 4S + 2D and 5M + 4S + 2D, respectively. Furthermore, if the parameter 
d of the curve H Cl d is chosen small then the total costs of projective and mixed 
w-coordinate differential addition and doubling reduces to 7M + 4S + ID and 
5M + 4S + ID, respectively. Moreover, from Proposition [TJ we can see that the 
mixed w-coordinate addition and doubling formulas are complete. 


Table 2. Cost of differential addition and doubling for families of binary elliptic curves 


Curve shape 

Representation 

Projective 

differential 

addition+doubling 

Mixed 

differential 

addition+doubling 

Short Weierstrafi 

y 2 + xy = x 3 + a 2 x 2 +ae 

XZ(x = X/Z)[ 28] 
XZ{x = X/Z)\n& 
XZ(x = X/Z)^Q §3.1] 
XZ(x = X/Z)^Q §3.2] 

7M + 5S + ID 

6M + 5S + ID 

7M + 4S + ID 

6M + 5S + 2D 

5M + 5S + ID 

5M + 5S + ID 

5M + 4S + ID 

5M + 5S + 2D 

Binary Edwards 
di(x + y) + d 2 {x 2 + y 2 ) 
= xy + xy(x + y)+x 2 y 2 

WZ(x + y= W/Z ) [6] 
WZ with di = d 2 [5] 

8M + 4S + 4D 

7M + 4S + 2D 

6M + 4S + 4D 

5M + 4S + 2D 

Hessian 

x 3 + y 3 + 1 = dxy 

WZ(1 + dxy = W/Z) 
formulas (fT71), (TTTT1) 

7M + 4S + 2D 

5M + 4S + 2D 

Generalized Hessian 
x 3 + y 3 + c = dxy 

WZ(c + dxy = W/Z) 
formulas (fTTl), (TTH1) 
WZ with small d 
formulas (flTl), (flSll 

7M + 4S + 3D 

7M + 4S + ID 

5M + 4S + 3D 

5M + 4S + ID 
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Table [2] shows the cost of differential addition and doubling for different co¬ 
ordinate systems on binary elliptic curves. From Table [H we see that our w- 
coordinate representations for generalized Hessian curves are competitive with 
other representations for binary elliptic curves. 


6 Conclusion 

In this paper, the family of generalized Hessian curves has been presented. This 
family covers more isomorphism classes of elliptic curves than Hessian curves. 
For every elliptic curve E over a finite field F g , the group E(¥ q ) has a point of 
order 3 if and only if E is isomorphic over F g to a generalized Hessian curve. 

Unified addition formulas have been presented for generalized Hessian curves 
H c d over a field F, see formulas (0, (flTTl) . In particular, these formulas are unified 
for Hessian curves Hi^. Further, the formulas are complete if c is not a cube 
in F. 

The cost of projective formulas using algorithm (1111) is 12M + ID. Also, the 
mixed addition formulas require 10M + ID. For generalized Hessian curves H c ,d 
over F with characteristic p / 2, the projective addition formulas m using 
extended coordinates has a cost of 6M + 6S + 2D. The mixed formulas require 
5M + 5S + 2D. 

When p = 2, the generalized binary Hessian curves provide very fast and 
efficient addition formulas. Projective formulas (fill) require 12M + ID and the 
mixed addition formulas need 10M + ID. Moreover, using the extended coor¬ 
dinates, formulas (THU) perform a projective addition using 9M + 3S + 2D and 
a mixed addition using 8M + 3S + 2D. Several doubling and tripling formulas 
have been presented for generalized Hessian curves which improve the previous 
doubling and tripling formulas on Hessian curves. Also, very competitive differ¬ 
ential addition and doubling formulas have been presented for generalized binary 
Hessian curves. 
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A On the Number of Distinct Generalized Hessian 
Curves 

A.l The Number of Distinct j-Invariants 

We recall from m that the number of distinct Hessian curves over the finite field 
F g , up to isomorphism over F g , is g — 1, [(9 + ll)/12j and \q/2\ if q = 0,1,2 
(mod 3), respectively. Using the similar method described in |16I17| . we give 
explicit formulas for the number of distinct generalized Hessian curves over the 
finite field F 9 up to isomorphism over F g . 

From Equation @, the j-invariant of H Cjd is j( H C)d ) = I ■ We 

use Jr to denote the set of distinct ^'-invariants of the family of generalized 
Hessian curves over F g and we let Jn(q) = # Jh- For c in ¥ q , with c yf 0, we let 

Jh c = {j | j = j(H Cid ), d e F„ d 3 ^ 27c} . 

Clearly, J H = (J c eF* Ar c - 

Lemma 1. Let ci,C 2 £ F* and let c = c\/c 2■ If c is a cube in ¥ q , then Jh c1 = 
</h C2 - If c is not a cube in F g , then we have Jh c1 D Jh C2 = {0}. 
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Proof. Suppose c = f 3 is a cube in F g . For all d £ F g with d 3 ^ 27c, we have 
= j(H C 2 ,d/e) and similarly j(H C2id ) = j(H Cl ^ d ). Therefore, J Hci = Jh „ 2 ■ 
Now, suppose that c is not a cube in F g . Let j £ Jh„ D Jh c • Then, j = 

j_ = -L for some d u d 2 £ F,. If j ± 0, we see 

that c = C 1 /C 2 is a cube in F g , a contradiction. So, Jh c1 D Jh c = {0}. □ 


Lemma 2. For q = 1 (mod 3), «/ c is not a cube in F g , we have ffJn c = 
(q + 2)/3. 


Proof. For d £ F g with d 3 ^ 27c, we let j(H C;( j) = ^ (F(d)) 3 where F(U) = 
U< 'u'-*- 27 c C ‘ l ■ We con sider the bivariate rational function F(U) — F(V). We obtain 


F(U) - F(V) 


u-v 

U 3 - 27c 



3C Z (L + 6C,) \ 

V-3Q )’ 


where, £i, ( 2 , C 3 are three cubic roots of c in F g . For all u, v £ F g with u 3 7 ^ 27 c, 
v 3 7 ^ 27 c, we see that F(u) = F(v) if and only if u = v. Hence, F is injective over 
F g and we have F( F g ) = F g . Now, consider the map k : F* —> F* by k(x) = ^x 3 . 
This map is 3 : 1, if q = 1 (mod 3). So, # Jh c = (q — l)/3 + 1. □ 

Theorem 4. For any prime power q, for the number Jh(?) of distinct values of 
the j-invariant of the family of generalized Hessian curves over the finite field 
F g , we have 


! q — 1, if q = 0 (mod 3) 

L(3g+1)/4J, if q= 1 (mod 3) . 

L<z/ 2 J , if q = 2 (mod 3) 

Proof. If q 1 (mod 3), every element of F g is a cube in F ? . Next, Lemma U 
implies that, for all c £ F*, we have Jh c = Jhi- Therefore, Jh = Jhi- Then, 
from [TF, Theorem 14], we have 

jq—1, if g = 0 (mod 3) 

IL9/2J i if q = 2 (mod 3) 

For q = 1 (mod 3), we fix a value c £ F g that is not a cube in F g . Following 
Lemma [T| we write Jh = Jh c U Jh 2 U Jhi , where Jh c D Jh 2 = Jh c H Jh! = 
Jh c2 f~l Jhi = {0}. By LemmaEl we have # Jh c = # Jh c 2 = {q + 2)/3. Moreover, 
from m Theorem 14], we have 


#^Hi = 


Therefore, we have 


(g+ll)/ 12 , if q = 1 (mod 12 ) 
(<? + 8)/12, if q = 4 (mod 12) 
(g + 5)/12, if q = 7 (mod 12) 
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Jh{q) 


(3<?+l)/4, if q = 1 (mod 12) 
3g/4, if q = 4 (mod 12) 

(3 q — l)/4, if q = 7 (mod 12) 


which completes the proof. 


□ 


A.2 The Number of Fg-Isomorphism Classes 

We recall from [TBj that the number of F g -isomorphism classes of Hessian curves 
over F g is [(9+ 11)/12J if q = 1 (mod 3) and q — 1 if q ^ 1 (mod 3). The 
following theorem gives explicit formulas for the number of distinct generalized 
Hessian curves, up to F g -isomorphism, over the finite field F g . 

Theorem 5. For any prime power q, the number of F q -isomorphism classes of 
the family of generalized Hessian curves over the finite field F g is 



L(3(q + 3)/4j , if q = 1 (mod 3) 
q— 1, ifq = 0,2 (mod 3) 


Proof. We use / h (<?) to denote the number of F g -isomorphism classes of the 
family of generalized Hessian curves over F g . 

If q = 0,2 (mod 3), then every generalized Hessian curve is F g -isomorphic to 
a Hessian curve via the map given by Equations dT|) . So, / h (<?) equals the number 
of F g -isomorphism classes of the family of Hessian curves over F g . Then, from 
[TBl Theorem 15], we have In{q) = q — 1 if q ^ 1 (mod 3). 

Now, suppose that q = 1 (mod 3). For a £ F g , let i#(a) be the set of 
F g -isomorphism classes of generalized Hessian curves H Ci d with j(H c ,d) = a- 
So, ffiu{a) is the number of distinct generalized Hessian curves with j-invariant 
a that are twists of each other. Clearly, ffin{a) = 0, if a qL Jh- We note that, 
for all elliptic curve E over F g , we have ffE( F g ) + ffE t ( F g ) = 2q + 2 , where 
E t is the nontrivial quadratic twist of E. We also recall that the order of the 
group of F g -rational points of a generalized Hessian curve is divisible by 3 (see 
Theorem | 2 |. Since q = 1 (mod 3), if the isomorphism class of H c ,d is in iff(a) 
then the isomorphism class of the nontrivial quadratic twist of H Ct d is not in 
in (a). So, #in{a) = 1 if a € Jh and a / 0, 1728 . Moreover, one can show that 
ffinia) = 3 if a = 0 and ffin(a) = 1 if a = 1728 , a / 0 and a £ Jh- Therefore, 


we have 


(q) = ^2 *n(a) = *#( a ) = 2 + 1 = 2 + Jn(q) ■ 



From the proof of Theorem 0] we have 



which completes the proof. 


□ 
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Abstract. Proxy re-encryption (PRE) is a cryptographic application proposed by 
Blaze. Bleumer, and Strauss. It is an encryption system with a special property in 
which the semi-honest third party, the proxy, can re-encrypt ciphertexts for Alice 
into other ciphertexts for Bob without using Alice’s secret key. We can classify 
PRE into bidirectional and unidirectional schemes. Canetti and Hohenberger for¬ 
malized the semantic security under chosen ciphertext attack for PRE, the PRE- 
CCA security. Several schemes satisfy the PRE-CCA security as a bidirectional 
or unidirectional scheme. However, some PRE schemes need a bilinear map in the 
standard model, and the other PRE schemes are PRE-CCA secure in the random 
oracle model before our work. In this paper, we construct a bidirectional PRE- 
CCA proxy re-encryption without bilinear maps in the standard model. We study 
lossy trapdoor functions (LTDFs) based on the decisional Diffie-Hellman (DDH) 
assumption proposed by Peikert and Waters. We define a new variant of LTDFs, 
re-applicable LTDFs, which are specialized LTDFs for PRE, and use them for 
our scheme. 


1 Introduction 

1.1 Background 

Proxy re-encryption (PRE) is a cryptographic application proposed by Blaze, Bleumer, 
and Strauss @|. It is an encryption system with a special property in which the semi- 
honest third party, the proxy, can re-encrypt ciphertexts for Alice into other ciphertexts 
for Bob without using Alice’s secret key. In the other words, if the proxy has a re¬ 
encryption key rk^B from Alice to Bob, the proxy can translate a ciphertext C A under 
Alice’s public key pk A into another ciphertext Cb under Bob’s public key pk H . This 
translation requires only rk^^B and keeps the message m secret for the proxy. There are 
many PRE cryptographic applications, such as email-forwarding, secure file systems, 
DRM, and secure mailing lists I2l4ll ll2lj . 

Ivan and Dodis classified PRE into two types, bidirectional and unidirectional m. 
The former means that, if the proxy has the re-encryption key rkA^>B, the proxy cannot 
only re-encrypt ciphertexts from Alice to Bob, but also in the opposite direction. That 
is, we can assume that the re-encryption key rk A <^B from Alice to Bob is identical to 

P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010. LNCS 6056, pp. 26l |-278,| 2010. 
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rks^A from Bob to Alice. On the other hand, the latter means that, the re-encryption 
key rk^^B from Alice to Bob never helps re-encryption of the opposite direction. They 
also considered the notions of single-hop and multi-hop. The former means that a re¬ 
encrypted ciphertext cannot be further re-encrypted. In contrast, the later means that a 
ciphertext can be re-encrypted from Alice to Bob to Carol and so on. We discuss only 
bidirectional schemes in this paper. 

The PRE-CCA Security. The security notion of indistinguishability against chosen 
cipher text attacks (IND-CCA) on encryption systems was proposed by Naor and 
Yung m. Many IND-CCA secure encryption systems have been proposed. 

Canetti and Hohenberger applied the notion of IND-CCA to PRE for the bidirec¬ 
tional scheme 0. They formalized the security notion as the (bidirectional) PRE-CCA 
security. They also investigated simulation-based security definitions that guarantee 
universally composable security. They constructed a bidirectional and multi-hop PRE- 
CCA scheme based on the decisional bilinear Diffie-Hellman (DBDH) assumption, 
which requires bilinear maps. Later, some research groups proposed bidirectional 
or unidirectional PRE-CCA schemes with bilinear maps or in the random oracle 
model 1711 21201221 . 

1.2 Our Contribution 

We construct a bidirectional and multi-hop PRE-CCA scheme without bilinear maps in 
the standard model. All previous PRE-CCA secure schemes in the standard model use 
bilinear maps. Our scheme is constructed in three steps. 

First, we define a new cryptographic primitive, re-applicable lossy trapdoor func¬ 
tions (re-applicable LTDFs), which are specialized lossy trapdoor functions for PRE. 
They consist of nine algorithms and a set of tags, (ParGen, LossyGen, LossyEval, 
Lossylnv, Relndex, ReEval, PrivReEval, Trans, FakeKey) and T. Second, we con¬ 
struct a bidirectional PRE-CCA scheme by using re-applicable LTDFs. We modify the 
original Peikert and Waters encryption scheme on several points. Third, we construct 
re-applicable LTDFs based on the decisional Diffie-Hellman (DDH) assumption. 

Our Techniques. As stated above, we modify the original Peikert and Waters encryp¬ 
tion scheme on several points for PRE. Our PRE scheme uses an index of all-but-one 
functions as the public parameter. The original scheme uses this index as a part of a 
public key. Our scheme generates a signature of a part of a ciphertext (co, cf) in the 
encryption scheme. The original scheme generates a signature of all the main parts of 
a ciphertext (ci,C 2 ,C 3 ) in the encryption process. The most different point is that our 
scheme re-encrypts ci from pk A to pk n by using rk A ^n. 

We modify LTDFs proposed by Peikert and Waters on one point for construction of 
re-applicable LTDFs. It is that an injective index is not Enc^(/), but Enc^Ar/), where r 
is a tag and 1 is the identity matrix. This modification is a technical change that satisfies 
the definition of re-applicable LTDFs. 

The Peikert and Waters Encryption. Peikert and Waters proposed LTDFs and con¬ 
structed an IND-CCA public-key encryption by using LTDFs fl7l . We briefly review 
their encryption scheme. Let f s and f> be functions with a domain {0,1}", where f is 
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an injective function with a trapdoor td, and f s > is a lossy function of which the size of 
a range is 2 n ~ k at most. LTDFs have the following property: Given /, the distinguisher 
cannot distinguish whether / is f s or f s >. They constructed them as f s (x) = xEnc pk (I) 
and fs'(x) = xEnCpt(O) where I and 0 is the identity and the zero matrix, and Enc is a 
homomorphic matrix encryption. From the homomorphism of Enc, we obtain f s (x) = 
xEnc pk (I) = Enc pk (xl) = Enc^(x) and f,(x) = xEnc^(O) = Enc pk (xO) = Enc M (0). 
We can reconstruct jc from f s (x) with a secret key sk. On the other hand, we never ob¬ 
tain x from f S '( jc) information-theoretically. They also proposed all-but-one trapdoor 
functions which have similar property to LTDFs. Let g sk b> be a function with a domain 
Z?x{0, I}", where It is a finite set and b* e B. For every b 4 - if, g k jy(b, ■) is an invertible 
function with a trapdoor td. On the other hand, yv./r (b*, •) is a lossy function. 

Peikert and Waters constructed their IND-CCA encryption scheme by using f s and 
g sk b- as follows. The encryption algorithm randomly chooses x e {0, 1 }" and selects a 
key pair of a one-time signature (vk, ska-). Then, it computes a ciphertext as 

C = (vk,c\,C2,C3,cr) = (vk, f s (x),gs'piyk, x), h(x) © m, SigSignO&o-, c\, C2, C3)), 

where h is a pair-wise independent hash and SigSign is a signing algorithm in a signa¬ 
ture scheme. They proved that this scheme satisfied the IND-CCA security. 

Obsen’ation of the Peikert and Waters Encryption: Free Part for Signature. The above 
encryption algorithm must sign all (ci,C2, C3) and make the signature cr. The signa¬ 
ture cr and vk guarantees that (ci, C2,cf) is signed with the signing key ska-. That is, 
(ci, C2, C3) is fixed by cr and vk. For this fixed (ci, C2, C3), the Peikert and Waters encryp¬ 
tion achieves the IND-CCA security. 

However, we find that it is not necessary to sign all (ci, C2, C3) in order to achieve the 
IND-CCA security. Moreover, we find that it does not need to sign ci. The reason is as 
follows. If (ci, C2, C3) is fixed by cr and vk, randomness x and a message m also are fixed 
because of the injectivity of f s and g sk b- • That is, we can consider cr as a signature of x 
and m as well as (ci, C2, C3). In addition, if x and m are determined, (c\,C2, C3) is deter¬ 
mined. We understand that it is necessary to sign x and m, not (ci,C2,C3). We replace 
a signature of (ci, 02,03) with a signature of (02,03). A pair of x and m is fixed simi¬ 
larly to the case of (01,02,03) because of the injectivity of g sk b *. That is, the signature 
of (02,03) performs tasks of a signature of (ci, 02,03). We do not need the signature of 
ci to achieve the IND-CCA security. This free ci is very important in constructing our 
proposed scheme, which satisfies the bidirectional PRE-CCA security. 

Remarks on Our Scheme. We discuss two points of our scheme: 1 . Comparison of 
efficiency and 2 . Construction based on other assumptions. 

Our scheme uses an index of re-applicable LTDFs as a public key. We represent this 
key as an n x n matrix, which has n 2 group elements. The public parameter contains 
n x n matrix as well as public keys since all-but-one trapdoor functions are based on 
the DDH assumption. One ciphertext and one re-encryption key have 0 (n) group ele¬ 
ments. However, in most of the previous schemes, they consist of a constant number of 
group elements. For example, one public key is one group element, and one ciphertext 
consists of five group elements, a verification key, and a signature in the Canetti and 
Hohenberger’s bidirectional scheme, which satisfies the PRE-CCA security as well as 
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ours Time complexity as well as space complexity are larger than others. Therefore, 
future work is to construct more efficient schemes. 

LTDFs are constructed from various assumptions. Therefore, one may think that 
we can construct PRE-CCA secure schemes from other assumptions. However, we do 
not know how to construct PRE-CCA secure schemes from other assumptions now. 
Factoring, quadratic residue, RSA, or Paillier-based LTDFs do not clearly satisfy the 
definition of re-applicable lossy trapdoor functions. One might think that decision linear 
or lattice-based LTDFs work in our proposal, but this is not clear. In other words, they 
do not guarantee several properties of re-applicable LTDFs. However, we might be able 
to use LTDFs based on them with other techniques for PRE. 

1.3 Related Work 

We review previous work on PRE and LTDFs. 

Proxy Re-Encryptions. Mambo and Okamoto first proposed the concept of proxy en¬ 
cryption, which delegates the ability of decryption through an interaction fl3ll . Based on 
their concept. Blaze, Bleumer, and Strauss proposed the notion of proxy cryptography. 
PRE is one concept in proxy cryptography (4). Their construction is nearly similar to the 
ElGamal encryption and satisfies the CPA security. A re-encryption key is made from 
the division of secret keys in the scheme. Later, Ivan and Dodis point out that Blaze 
et.al.'s scheme is bidirectional and multi-hop. Ateniese, Fu, Green, and Hohenberger 
proposed the first unidirectional scheme, which was single-hop and satisfied the CPA 
security with a bilinear map [2]. Deng, Weng, Liu, and Chen constructed a bidirectional 
and single-hop PRE scheme without a bilinear map in the random oracle model (7). 
Libert and Vergnaud discussed the unidirectional PRE-CCA security and constructed 
an unidirectional and single-hop PRE-RCCaQ scheme with a bilinear map ifLZll . Shao 
and Cao proposed an unidirectional and single-hop PRE-CCA scheme in the random 
oracle model osE Weng, Chow, Yang, and Deng improved Shao and Cao’s scheme, 
and their scheme also is in the random oracle model li22l . We put this previous work in 
chronological order in Figure 1. ROM stands for the random oracle model. 

Hohenberger, Rothblum, shelat, and Vaikuntanathan argued the existence of obfus- 
cators with PRE and practically constructed an obfuscator as a PRE scheme @. Their 
scheme is CPA-secure. 

Recently, Ateniese, Benson, and Hohenberger introduced an additional property on 
PRE, which is key-privacy (or anonymous) m. It is desirable to have this property, if 
the proxy can freely re-encrypt ciphertexts. Our scheme does not have the key-private 
property. They constructed a key-private scheme with bilinear maps in the standard 
model, which satisfies the PRE-CPA security, not CCA. 

1 This security notion is truly weaker than the CCA security and stronger than the CPA security. 
An adversary can use a decryption oracle in a restricted way. If he sends messages (mo, mf) to 
the challenger and obtains a challenge c, the adversary cannot query ciphertexts of challenge 
messages m 0 and mi to the decryption oracle. 

2 Two research groups posed questions on the security model of this paper and published their 
discussions on ePrint Archive Elm However, they do not effect our results and we do not 
mention this in this paper. 
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Authors 

Direction 

Hop 

Assumption 

Security 

Bilinear Map 

ROM 

Blaze et.al. |f4| 


multi 

DDH on G p 

PRE-CPA 

no 

no 

Ateniese et.al. |I2J 


single 

eDBDH 

PRE-CPA 

yes 

no 

Canetti and Hohenberger |6) 

<-> 

multi 

DBDH 

PRE-CCA 

yes 

no 

Libert and Vergnaud (12] 

-» 

single 

3-wDBDHI 

PRE-RCCA 

yes 

no 

Deng et.al. (7) 

<-> 

single 

mCDH on G p 

PRE-CCA 

no 

yes 

Shao and Cao I20!l 

—> 

single 

DDH on Z fp 

PRE-CCA 

no 

yes 

Weng et.al. [|22| 

-» 

single 

CDH on G p 

PRE-CCA 

no 

yes 

This paper 

<-> 

multi 

DDH on Gp 

PRE-CCA 

no 

no 


Fig. 1. Comparison of our work with previous work 

Lossy Trapdoor Functions. We review previous work related to LTDFs. Peikert and 
Waters proposed notions of LTDFs [T7J. They showed cryptographic applications based 
on LTDFs, such as a (ordinary) trapdoor function, a collision-resistant hash function, 
an oblivious transfer, and an IND-CCA encryption scheme. They also showed the con¬ 
structions of LTDFs based on the DDH assumption and the LWE assumption. They 
mentioned that the Paillier encryption realized LTDFs by the similar methodology to 
the construction based on the DDH assumption. Later, Rosen and Segev noted that the 
Damgard-Jurik encryption scheme simply satisfied the definition of LTDFs because of 
its number-theoretic property lfl8l . The Damgard-Jurik encryption scheme is considered 
as a generalized Paillier encryption. In addition to 11181 . Freedman, Goldreich, Kiltz, 
Rosen, and Segev proposed more constructions of LTDFs (8). They are based on the cl- 
Linear assumption and the QR assumption. Mol and Yilek showed that slightly LTDFs 
are sufficient for constructing a IND-CCA secure public-key encryption scheme fT4l . 
Slightly LTDFs lost a (1 - ojflog «)) fraction of all its input bits. 

Other applications of LTDFs have been proposed. Rosen and Segev proposed a new 
primitive, a one-way function under correlated products HU- They showed the con¬ 
struction of a one-way function under correlated products from LTDFs and the IND- 
CCA secure encryption by using this primitive. Boldyreva, Fehr, and O’Neill applied 
LTDFs to the construction of deterministic encryption a. They constructed a CCA- 
secure deterministic encryption scheme in the standard model, where the CCA security 
meant the sense of the semantic security on a message, not indistinguishability of mes¬ 
sages. Bellare, Hofheinz, and Yilek formalized a new security notion of encryption, 
selective opening attack, which meant that it kept secret even if an adversary selectively 
obtained messages and randomness of ciphertexts 0. They used LTDFs for the above 
purpose. Nishimaki, Fujisaki, and Tanaka used all-but-one trapdoor functions for the 
universally composable commitment scheme Ifl6l . They first proposed a non-interactive 
string-commitment scheme, which is universally composable. 

Organization 

In Section[2l we show preliminaries to describe our scheme. In Section^ we define re- 
applicable LTDFs. In Section [4] we review the definition of bidirectional PRE schemes, 
propose our scheme, and describe a sketch of a proof of it. In Section 0 we construct 
re-applicable LTDFs based on the DDH assumption. 
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2 Preliminaries 

In this section, we show preliminaries to describe our scheme. 

2.1 Notation 

Let S be a finite set. .v S denotes that the element s is chosen from S uniformly at 
random. For probabilistic algorithm A, y «— A(x) denotes that A outputs y on input x 
with uniform randomness. If A runs in time polynomial in the security parameter, then 
A is a probabilistic polynomial-time (PPT) algorithm. We say that function / : N —» 
[0,1] is negligible in A e N if for every constant c e N there exists k c e N such that 
f(A) < A~ c for any A > k c . We say that function g : 1T —> [0, 1] is overwhelming in 
A 6 N if function f(A) = 1 - g(T) is negligible in A e N. Let X A and Y ,i denote random 
variables over a finite set Z A c {0,1}'\ where A e N is the security parameter. We say 
that X A and Y A are (computationally) indistinguishable if, for every distinguisher D, 
Pr[ D(X,i) = 1] - Pr| D( Y A ) = 1 ][ is negligible in A e N. We say that X A and Y A are 
statistically indistinguishable if £, eZi I PrRi = z] - Pr[F,i = z] is negligible in A e N. 

2.2 DDH Assumption 

We review the DDH assumption. Let Q be an algorithm that takes as input a security 
parameter A and outputs a tuple ( p , G, g), where p is a prime with 2 ,i_1 < p < 2 A , G is a 
cyclic group of prime order p, and g is a generator of G. 

Assumption 1 (The Decisional Difhe-Hellman Assumption). For any PPT adversary 
A, the advantage Adv.tlT) is negligible in the security parameter k. 

Adv 4 « = | Pr [A((p, G, g), g a , g b , g ab ) = 1] - Pr [A((p, G, g), g fl , g b , g c ) = 1]| 

The probability is over the random choices of (p, G, g )«— Q(A) , the random choices of 
a,b,c e Z p and the random coin of A. 

2.3 All-But-One Trapdoor Functions 

We review all-but-one trapdoor functions to describe our scheme. All-but-one trapdoor 
functions are made from the DDH assumption lflTIl . 

Definition 1 (All-but-one trapdoor functions). A collection of ( n, £)-all-but-one trap¬ 
door functions is a tuple of PPT algorithms (G a bo, F a b 0 , F". 1 ) and sequence of branch 
sets B - {B i) such that: 

All-but-one property: Given a lossy branch b* e B A , the algorithm G a b„( I /?*) out¬ 
puts a pair ( s,td ). For every b e B A \{b*}, the algorithm F a bo(s, b, •) computes an 
injective function f s b( •) over {0,1)", and F^ o (td, b, ■) computes f~}{ •). For the 
lossy branch b *, F a b 0 (^, b*, ■) computes a lossy function f s ,b>{-) over {0,1}", where 

i/^*ao, mi < 2"-*. 

Indistinguishability: For every h\ and b* e B A , the first output so of G a b„( 1 \ b* 0 ) and 
the first output ,v| of G a h 0 f I b\) are computationally indistinguishable. 
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3 Re-applicable Lossy Trapdoor Functions 

In this section, we propose a new primitive, re-applicable LTDFs, which is an ex¬ 
tension of LTDFs. Peikert and Waters proposed LTDFs and all-but-one functions in 
STOC’08 03. 

For our purpose, we transform LTDFs in several points. First, we add one algorithm, 
the parameter-generation algorithm ParGen. This algorithm generates public parame¬ 
ters which is common to every algorithm and applied in every evaluation. We introduce 
ParGen since PRE schemes are used in the multi-user setting, not single-user setting. In 
addition, the validity checks of ciphertexts require that each ciphertext of the ElGamal 
encryption in an index of LTDFs has common randomness for every user. 

Second, we modify LTDFs so that the function-generation algorithm receives a tag 
in a set of tags 7~, not injective or lossy commands. For every tag r, except one special 
lossy tag Tics, the function-generation algorithm outputs an index that represents an in¬ 
jective function. On the other hand, the function-generation algorithm given ri os outputs 
an index that represents a lossy function. 

Third, we define five new algorithms Relndex, ReEval, PrivReEval, Trans, and 
FakeKey. Relndex, ReEval, PrivReEval, and Trans are deterministic, and FakeKey 
is probabilistic. We apply Relndex for generating re-encryption keys, ReEval for eval¬ 
uation of re-encryption, and PrivReEval for a validity check of ciphertexts. The algo¬ 
rithms Trans and FakeKey are only used in the proof. The algorithm Trans guarantees 
the transitivity between re-encryption keys. In other word, we can make a re-encryption 
key rkj^k from rk iMJ and rk^. The algorithm FakeKey generates a pair of public and 
re-encryption keys (pkj, rk^j) from another public key pk,. Moreover, even if pkj rep¬ 
resents a lossy function, FakeKey always outputs pkj, which represents an injective 
function. This property is necessary for the last modification in our proof. We introduce 
T to provide this property. 

We call this new primitive, a collection of re-applicable LTDFs. They are specialized 
LTDFs for PRE. If we unify ParGen and LossyGen, ignore the other new algorithms, 
and define T = {Tj n j,ri os }, we can consider this new primitive as (ordinary) LTDFs 
proposed by Peikert and Waters. 

Definition 2 (Re-applicable LTDFs with respect to function indices). Let (ParGen, 
LossyGen, LossyEval, Lossylnv, Relndex, ReEval, PrivReEval, Trans, FakeKey) be 
a tuple of PPT algorithms, and T be a set of tags that contains one lossy element ti os . 
The algorithm ParGen(l /l ) outputs a public parameter par. The other algorithms apply 
the parameter par to their computations. Hereafter, we omit the input of the public 
parameter par for the algorithms. 

A collection of re-applicable in, k)-lossy trapdoor functions with respect to function 
indices is a tuple of the PPT algorithms (ParGen, LossyGen, LossyEval, Lossylnv, 
Relndex. ReEval, PrivReEval, Trans, FakeKey) such that: 

Injectivity; For every public parameter par <— ParGen(l /l ) and every tag r e 
T\{ti os }, LossyGen(r) outputs a pair of a function index and its trapdoor 
( s,td ), LossyEvaK.v, ■) computes an injective function /„(•) over {0,1}", and 
Lossylnv(r<7, r, •) computes /“,!(■). 
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(We represent the function f S T , not f s , in order to clarify a tag r. If we do not need 
to clarify a tag, we represent a function as f s ,*) 

Lossiness: For every public parameter par <— ParGenf I * ), the algorithm 
LossyGen(Ti os ) outputs (.v, ±) and LossyEvalf.v, •) computes a function f s . Tl ,J') over 
{0,1}", where |/ s , Tlos ({0,1}")I < 2 n ~ k . 

Indistinguishability between injective and lossy indices: Let X A denote the distribu¬ 
tion of (par, ^nj, r), and let Y A denote the distribution of (par, 5i os , r'). where par is 
a public parameter from ParGen(l /l ), r and r' are random elements in T, and the 
function indices .Vj nj and ,V| os are the first element outputs from LossyGen(r) and 
LossyGen(Ti os ). Then, {X,i} and {T,t} are computationally indistinguishable. 

Re-applying with respect to function indices: Let r, and r ; be any tags with r, + ti os 
and tj + T] os . The algorithm Relndex(f<7,, tdj) outputs s^j, where tdj and tdj are the 
second elements of LossyGen(r,) and LossyGen(r ; ). Then, for every x e {0,1}", 
x = Lossylnv(rr/j, r,-, ReEval(^„ 7 -, LossyEval(s,-,x))). We remark that Lossylnv 
takes r, as one of the inputs, not t j. 

Generating proper outputs: Let c be an output from ReEvaK.v,^/, LossyEvalf.v,, x)), 
where .y, M; and ,v,- have the same meaning as that in the above paragraph. Then, 
PrivReEvaKx, T,-, Ty, ,y ; j outputs the same c, where x, r,-, Tj, and Sj have the same 
meaning as that in the above paragraph. That is, ReEval(.y, M/ , LossyEvalf.v,, •)) 
and PrivReEvaK-, t,-, tj, Sj) are equivalent as a function (i.e. Any output of 
ReEvalG,^-, Lossy EvaIG,, •)) is independent of .v,-.). 

Transitivity: Let (si,tdi), ( Sj,tdj ) and f.v>, fc4) be outputs from LossyGen(rj), 
LossyGen(r ; ), and LossyGenfr/,), and let .v, M/ and be outputs from 
Relndex(faf,-, tdj) and Relndex(f<f,-, tdk), respectively. Then, Trans(.v,„/, >v,«<;) out¬ 
puts Sj^k which is the same output from Relndex(R/j, tdk). 

Statistical indistinguishability of the fake key: The algorithm FakeKeyf.v,, r,) out¬ 
puts (s'., t'), where Sj is the first element of an output from LossyGenfr,-). 

Let Xj denote the distribution of (par, s,-, Sj, s^j, Tj, Tj), and let Y\ denote the 
distribution of (par, .v,-, ,v', s'^j, Tj, r'), where each par, Sj, s^j, and r ; has the 
same meaning as that in the above paragraph. Then, {X,i} and {T,i} are statistically 
indistinguishable. 

Generation of injective functions from lossy functions: Let 5 be the first element of 
an output from FakeKey(^i os , t), where r is a tag and .V] 0S is the first element of 
an output from LossyGen(ri os ). Then, for every r, LossyEvalf.v, •) represents an 
injective function / s * with overwhelming probability, where a random variable is 
the randomness of Fake Key (vi os , t). (We do not require other properties of index s 
if/ s * is injective. The function /, * cannot have any trapdoor information.) 


4 Bidirectional and Multi-Hop PRE-CCA Scheme 

In this section, we first review the definition of a bidirectional PRE scheme. Then, we 
describe our scheme and show a sketch of the proof. 

A bidirectional PRE scheme consists of six algorithms 77 = (Setup, KeyGen, Enc, 
Dec, ReKeyGen, and ReEnc) as follows. PP <— Setup(l /i ): Given a security parameter 
l' 1 , the setup algorithm outputs a public parameter PP. This algorithm is executed by 
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a trusted third party, (pk, sk ) <— KeyGen(PP): Given a public parameter PP, the key 
generation algorithm outputs a public key pk and a secret key sk. C <— Enc(PP, pk, m ): 
Given a public key pk and a message m e M, the encryption algorithm outputs a 
ciphertext C, where A1 is a message space, rk^j <— ReKeyGen(PP, sk t , skj): Given 
a pair of secret keys ski, skj, where i + j, this algorithm outputs a re-encryption key 
rki^j. We call rC<-> ( the re-encryption key between i and j. Cj <— ReEncfPP, rk !MJ , C,): 
Given a re-encryption key rk^ j between i and j and a ciphertext C, for i, this algorithm 
outputs another ciphertext Cj for j or the error symbol ±. m <— Dec (PP, sk, C ): Given 
a public key sk and a ciphertext C, the decryption algorithm outputs a message in or the 
error symbol ±. 

If the following two conditions holds, we say that the PRE scheme 77 satisfies cor¬ 
rectness. For every PP which is output from Setup) T 1 ), every (pk, sk) which is output 
from KeyGen(PP) and every message m e M, the probability Pr[C <— Enc(PP, pk, m) : 
Dec(PP, sk, C) = m\ is overwhelming. For every natural number n e N, every PP 
which is output from Setup) l /l ), every ( pk\, sk\)... ( pk„, sk „) which are outputs from 
KeyGen(PP), every message m e M, and every /'Pi <->2 • ■ • 1 which are out¬ 

puts from ReKeyGen(rA: ( ', rp (+ i) for each i e [1 ,n — 1], the probability Pr[Ci «— 
Enc(PP,pP|,m) : Dec(PP, sk n , ReEnc(PP, ... ReEncfPP, rki^, 2 , Ci) ■ • ■)) = 

m] is overwhelming. 

4.1 Bidirectional and Multi-Hop PRE-CCA Security 

We prove that our scheme satisfies the PRE-CCA security in the full version of this 
paper. This security notion was proposed by Canetti and Hohenberger |6). 

Definition 3 (Bidirectional and Multi-Hop PRE-CCA Security). Let A be the secu¬ 
rity parameter, A be an oracle TM, representing the adversary, and Py and Pc be date 
structures. Date structures Py and Pc are first initialized as empty in the game. The 
game consists of an execution of A with the following oracles, which can be invoked 
multiple times in any order, subject to the constraint below: 

Setup Oracle: This oracle can be queried first in the game only once. This oracle 
makes a public parameter as PP <— Setup)l /! ). A is given PP. 

Uncorrupted key generation: This oracle generates a new key pair (pk, sk) <— 
KeyGen(PP) and adds pk in P f/ , where PP is generated from the setup oracle. 
A is given pk. 

Corrupted key generation: This oracle generates a new key pair (pk, sk) <— 
KeyGen(PP) and adds pk in Pc, where PP is generated from the setup oracle. 
A is given (pk, sk). 

Challenge oracle: This oracle can be queried only once. On input (pk*, mo, mi), the 
oracle chooses a bit Z? <— {0,1) and returns C* = Enc (PP, pk*,nib). We call pk* 
the challenge key and C* the challenge ciphertext. (We require the challenge key 
pk* e ru for A to win.) 

Re-encryption key generation: On input (pk,, pkj) from the adversary, this oracle re¬ 
turn the re-encryption key rk, MJ = ReKeyGenCsL,, skj), where sk; and skj are the 
secret keys that correspond to pk , and pkj, respectively. 
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We require that pk ,■ and pkj are in or alternatively both are in / 7/. We do not 
allow for re-encryption key generation queries between a corrupted key and an 
uncorrupted key. 

Re-encryption oracle: On input ( pk/, pkj, C,), if pkj e /}■ and (pkj,Ci) is a deriva¬ 
tive of ( pk*,C*), then return a special symbol ±, which is not in the do¬ 
main of messages or ciphertext. Else, return the re-encrypted ciphertext Cj — 
ReEnclReKeyGenUA;;, skj), Cj). Derivatives of (pk*,C*) are defined inductively 
as follows. 

- (pk*, C*) is a derivative of itself. 

- If ( pk,C ) is a derivative of (pk*, C*), and ( pk', C) is a derivative of (pk,C), 
then (pk', C') is a derivative of (pk*, C*). 

- If A has queried the re-encryption oracle on input (pk, pk 1 ,C) and obtained 
response C' , then (pk', C') is a derivative of (pk, C). 

- If A has queried the re-encryption key generation oracle on input (pk,pk') or 
(pk', pk), and C = ReEnctReKeyGenf.sT', sk!), C), then ( pk, C) is a deriva¬ 
tive of (pk, C), where sk and sk' are the secret keys that correspond to pk and 
pk', respectively. 

Decryption oracle: On input (pk, C), if the pair (pk, C ) is a derivative of the challenge 
key and ciphertext (pk*, C*), or pk is not in ru U I\-, then return a special symbol 
± which is not in the domain of messages. Else, return D ec(sk, C), where sk is the 
secret key that corresponds to pk. 

Decision oracle: This oracle can be queried at the end of the game. On input b': If 
b’ — b and the challenge key pk* e Ey, then output 1. Else, output 0: 

We describe the output of the decision oracle in the above game as Expt^ PRE ‘ CCA (/l) = 
b for an adversary A and a scheme 77. We define the advantage of adversary A as 

Adv“/ RE - CCA (d) d = f |pr[Expt^ d ; PRE ' CCA (/l) = 1] - , 

where the probability is over the random choices of A and oracles. We say that the 
scheme 77 is secure under the bidirectional PRE-CCA attack, if, for every adversary A, 
Adv^ PRE " CCA (/l) is negligible in the security parameter A. 

4.2 Description of Our Scheme 

We next describe our scheme. Let A be the security parameter, and let n, k. Id, k" and 
v be parameters depending on A. Let (SigGen, SigSign, SigVer) be a strongly unforge- 
able one-time signature scheme where verification keys are in {0,1} V . Let (ParGen, 
LossyGen, LossyEval, Lossylnv, Relndex, ReEval, PrivReEval, Trans, FakeKey) be 
a collection of re-applicable ( n, A:)-LTDFs and T be a set of tags. Let (G a b 0 , F a b 0 , F,,') 
be a collection of (n, 7'')-A BO trapdoor functions with branches Bi = {0,1} V , which 
contains the set of signature verification keys. Let 77 be a family of pairwise indepen¬ 
dent hash functions from {0,1}" to {0,1} A ". We require that the above parameters are 
( k + Id) - (k" + n) > 6 = 6 i + 62 for some <5i = w(log/l) and 62 = w(log A). Our 
cryptosystem has message space {0,1 } A . 
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The algorithm Setup generates a public parameter PP = (.v a b<>, par, h), and the al¬ 
gorithm KeyGen makes a pair of keys (pk, sk ) = ((s r itdf> t), (R^ttdf. Sritdf, t)). Except a 
tag r, we can consider that both Setup and KeyGen are the same algorithm as the key 
generation algorithm in the Peikert and Waters encryption. The algorithm Enc is also 
the same algorithm as the encryption algorithm in the Peikert and Waters encryption, 
except that ci is not signed for the re-encryption. The algorithm ReKeyGen makes a re¬ 
encryption key, and the ReEnc re-encrypts a ciphertext into another ciphertext. These 
algorithms only use Relndex and ReEval in re-applicable LTDFs. The algorithm Dec 
is the same algorithm as the decryption algorithm in the Peikert and Waters encryption, 
expect that, if a ciphertext is re-encrypted, it applies PrivReEval for the validity check 
of ciphertexts. 

Setup! I 1 ): Setup(l /l ) first generates an index of all-but-one trapdoor functions with 
lossy branch 0 V : (s a bo, h/ a bo) <— G a bo(l'\ 0 V ). Then, it generates a public parameter 
of re-applicable LTDFs: par <— ParGenf I * ). Finally, it chooses a hash function 
h <— 'PI. It outputs a public parameter as PP = (s- d b 0 , par, h). 

(The algorithm Setup erases the trapdoor td d b 0 because the following algorithms 
do not use td d b 0 .) 

KeyGen(PP): KeyGen takes PP = (,s a b„, par, h) as input. It chooses a tag r e 
7 _ \{tio S } and generates an injective index of re-applicable LTDFs: (s r itdf, hiritdf) 
LossyGen(r). A public key consists of the injective function index and the tag, 
and a secret key consisting of the trapdoor of 5 r itdf and the tag: pk — (,s>i t dr, r), and 
sk — (R/rlidf, V r |ulf, T). 

Enc( PP, pk, m) : Enc takes (PP, pk, m) as input, where PP - (s a b 0 , par, h) is a tuple of 
public parameters, pk - (,y r i t df, r) is a public key, and m e {0,1 ) f is a message. It 
chooses x e {0,1}" uniformly at random. It generates a key-pair for the one-time 
signature scheme: ( vk, sk, r ) <— SigGen! 1 1 ), then computes 

ci = LossyEval(s r itdf, x), C 2 = F a b 0 (.?abo, vk, x), and C 3 = h(x) © m. 

Finally, it signs a tuple (C 2 , C 3 ,r) as cr <— SigSignt.vAv, (C 2 , C 3 ,r)). Then, a cipher- 
text C is output as C — (vk, C\,C 2 , C 3 , r, cr). 

ReKeyGent PP, ski, skf) : ReKeyGen takes as input (PP, sksk/), where (sk,, skj ) = 
((tdi, Sj, Tj), (tdj, Sj, Tj)). It computes s,„ j <— Relndex(fc/,-, tdj), then outputs a re¬ 
encryption key as rk^j — s^j. 

ReEnct PP, rk^j, C,) : ReEnc takes (rk^j, C,) as input, where rk lMJ = Sj MJ is a re¬ 
encryption key and C, = (vk, C\j, C 2 , C 3 , r, cr) is a ciphertext. It computes cij <— 
ReEval(j 1M; -, c^,). It then outputs Cj = (vk, c\j, C 2 , C 3 , r, cr) as a new ciphertext for 
the user with skj. 

Dec (PP, sk, C ) : Dec takes (PP, sk, C) as input, where PP — (s a b 0 , par. h) is a tuple of 
public parameters, sk - (R/ r itdf> ^Vitdr, r) is a secret key, and C = (vk, c\, cs, C3, P, cr) 
is a ciphertext. It first checks SigVeriTA:, (C2, C3, r'), cr) - 1 ; if not, it outputs ±. 

It then compute x = Lossylnv(/r/ r | U ii , t’, C|). If r = r' then it checks ci = 
LossyEval(s r i t( if, x), otherwise, it checks PrivReEvalfx, t',t, j r i t df) = ci; if not, it 
outputs _L. It also checks C 2 - F a b,,(.v a b,,, vk, x); if not, it outputs ±. Finally, it out¬ 
puts m = c '3 © h(x). (We note that, if C was not re-encrypted, then r = P. On the 
other hand, if C was re-encrypted, t + P .) 
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4.3 Security of Our Scheme 

In this section, we claim the following theorem and describe a sketch of the proof. We 
give the detail proof in the full version of this paper. 

Theorem 1. The above proposed scheme satisfies the PRE-CCA security. 

We now show modifications of games from Gameo to Gameio to prove Theorem Q] 
Gameo is identical to the PRE-CCA game. In Gameio, no adversary can win with 
meaningful probability. Every modification from Game, to Game, + i is perfect, statisti¬ 
cally or computationally indistinguishable for each i e f 0, n -1 ]. Therefore, we conclude 
that no adversary also can win with meaningful probability in Gameo- 

In every game, let C* = (vk*, c*, c* 2 , c* y T*,cr*) and pk* - (s* ltdf , t*) be the challenge 
ciphertext and the challenge public key, respectively. 

Gameo: This game is identical to the PRE-CCA game. 

Gamei : Let x* denote a random input applied to making the challenge ciphertext, (i.e. 
Cj = LossyEvalO* tdf , x*), c * 2 = F abo Oabo, vk*,x*), c* 3 = h(x*)®m b .) 

Then, we modify the decryption oracle as follows: The decryption oracle is 
given a decryption query (pk, C) = ((s, r), (vk, C\,C 2 , C 3 , r', cr)), where pk is limited 
to an output by the corrupted or the uncorrupted oracles. If (pk, C ) is a deriva¬ 
tive of (pk*,C*), then it outputs ±. Else if the decryption query satisfies that 
(vk, C2, C3, t', cr) = (vk*, c* 2 , C3, r*, cr*) and PrivReEvalf.C, r', r, s ) = c\, then it out¬ 
puts m <— C 3 © h(x*). Otherwise, it outputs Dec( sk, C) in the ordinary decryption 
processes. 

This modification does not affect any success probability of an adversary. From 
the injectivity of PrivReEvaK-, r', r, s), if PrivReEval(x*, r 7 , r, s) = c 1 , then we ob¬ 
tain Lossylnv(ft/, t', ci) = x*. In fact, the probability that the above queries satisfy 
PrivReEvaK a'*, r', r, s ) = c 1 is negligible. We discuss this fact in the modification be¬ 
tween Game 9 and Gameio- However, this check is necessary for the following modifi¬ 
cations. 

Game 2 : We add the following check to the decryption oracle after checking 
a derivative: The decryption oracle is given a decryption query (pk, C) = 
((s, t), (vk, ci, C 2 , C 3 , t', cr)). The decryption oracle always outputs _L, if vk = vk* 
and (c 2 , C3, t', cr) + (c 2 , C3, r*, cr*). 

This modification is negligible for the success probability of an adversary from the 
strongly existential unforgeability of the signature scheme. 

Game 3 : Let vk* be the verification key used by the challenge oracle. We modify the 
setup oracle on a lossy branch as follows: We replace generates (Sabo, R4bo) 
G a bo(l'\0 1 ’) with OaboH^abo) G a bo(l /! , vk*). The lossy branch is changed the 
verification key vk* from a zero-padding 0 V . 

This modification is negligible for the success probability of an adversary from the 
computational indistinguishability of all-but-one trapdoor functions. 


CCA Proxy Re-Encryption without Bilinear Maps in the Standard Model 273 


Garner Let (.v a b„, td d \,o) be an index of all-but-one trapdoor functions and its trap¬ 
door. We modify the decryption oracle as follows: The decryption oracle uses 
the trapdoor td d b 0 to decrypt ciphertext, when it receives a ciphertext C = 
(vk, ci, C 2 , C 3 , t', cr). That is, in the case of (vk, C 2 , C 3 , r', cr) 4 (vk*c *, C 3 , r*,cr*) and 
vk A v/c*, we replace x <— Lossylnv(tr/ r i t df, t', ci) with x <— F"jJ (f</abo> vfc, C 2 ), and 
proceed with the decryption processes (In the other cases, the decryption oracle 
executes the operations defined in Gamei and Game 2 .). 

This modification does not affect any success probability of an adversary because of the 
injectivity of LossyEvaK.v, •), F a b 0 (s a bo, vk, •), and PrivReEvaK-, r\ r, s). In this modifi¬ 
cation, F a bo(^ a bo, vk, •) is an injective function since vk is not the lossy branch vk*. 

In the following games, the challenger manages uncorrupted re-encryption keys to 
apply the table. That is, in the first of this game, the challenger makes an empty table, 
which memorizes uncorrupted re-encryption keys. We set that a pair of keys (pk\, sk\) 
is the first output by the uncorrupted oracle. At n-th query, the uncorrupted oracle out¬ 
puts a pair of keys ( pk n , sk„) and makes the re-encryption keys rk\^ n ,- ■ ■ ,rk„_ i„„ and 
rk n „\ ,■ • • ,rk n „ n -\. The challenger adds re-encryption keys to the table. 

Games: We modify the re-encryption oracle as follows: The re-encryption oracle takes 
as query ( pk a ,pkb,C a ) = ((s a ,T a ),(sb,Tb), (vk, C 2 ,C 3 , Y a , cr)) in the game. 
If pk a is corrupted, then it evaluates x <— Lossylnv(fc/„, r' n ,c\j,). Then, it makes 
ci.b = PrivReEvalfx, r fl , t>, Sb )■ If pk\, is uncorrupted, it searches rk a „b in the re¬ 
encryption keys table and evaluates c\j, <— ReEnc(/L„ M /,, c\ M ). Then, it outputs 
Cb = (vk, C\_b, C 2 , C 3 , Y a , cr) as a re-encrypted ciphertext for pk 

This modification does not affect any success probability of an adversary because of the 
equivalence between PrivReEvalf-, T a , Tb, Sb ) and ReEnc(r^ fl nb, LossyEval(.? a ., •))• 

Gamee: We modify the re-encryption key generation oracle as follows: Given a pair 
(pk a , pkb), it searches the re-encryption keys rk a „b from the table. Then, it outputs 
this re-encryption key rk a ^>b- 

This modification does not affect any success probability of an adversary. 

Game?: We define the number as the maximum number of times that an adver¬ 
sary A queries the uncorrupted oracle in the game. We modify the challenge oracle 
as follows. 

First, the challenger chooses a random number r e {1,..., <y, 4 . Lmc ). If the chal¬ 
lenge oracle receives the challenge key pk* 4- pk r , the challenger outputs a random 
bit b and aborts this game, where pk, is the r-th public key output by the uncor¬ 
rupted oracle. Otherwise, it proceeds with this game. 

This modification reduces the success probability of an adversary to l/qA.unc fraction. 
However, this is not an important reduction since qA,unc is polynomial of the security 
parameter A. 

Gameg: We modify the uncorrupted key generation oracle as follows. 

First, it executes the following preprocessing. It choose a random number r e 
{ 1 ,..., < 7 /t,unc) and generates a pair of keys (pk r , sk, ) «— KeyGen(PP). We describe 
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pk r = ( s r , T r ) and sk r - (td r , s r , T r ). Then, for every i e {1,..., <7A,unc}\{ r h it uses 
the fake key generation algorithm to generate a public key and the re-encryption 
key: it computes ( Si , s r „i, r,) <— FakeKeyG,., r, ) and sets pk = (s t , Tj ), rk Then, 
it computes that rk^j = s^j <— Trans(sv„„ for every i,j e {1,... , </A,unc} 
and i + j. It adds every above-mentioned re-encryption key {rk, M7 },j e jito 
the re-encryption keys table. 

At j G {1,..., qA.unc} times query for the uncorrupted oracle, it outputs the public 
key pkj = sj generated by the above preprocessing. The re-encryption key oracle 
and the re-encryption oracle execute their processes with the preprocessed table. 
The challenge oracle applies the above number r to the check pk* = pk r . 

This modification is negligible for the success probability of an adversary from the 
statistical indistinguishability of the fake key algorithm. 

Game.,: We modify the above preprocessing as follows: We replace the first key gen¬ 
eration pk r = s r <— LossyGen(r,) with pk r = s r «— LossyGen(Ti os ), where ri os is 
a lossy tag. 

This modification is negligible for the success probability of an adversary from the 
computational indistinguishability of LTDFs. 

Gameio: We modify the decryption oracle as follows: The decryption oracle always 
outputs _L, when it receives the query C = (vk*, ci, c* 2 , Cj, cr*) and pk = 5 r i t df. 

This modification is negligible for the success probability of an adversary from the fact 
that the average-case min-entropy of x* is high, and every public key pk — s , except 
a challenge key, represents an injective function f sir . That is, adversary never compute 
ci such that fj\.(c i) = x* information-theoretically. This fact implies that Gameio is 
statistically close to Gamey. Due to this modification, we attach tags to re-applicable 
LTDFs. 

From the above sketch, we transform the PRE-CCA game (i.e. Gameo) into the last 
game (i.e. Gameio). In the last game, Gameio, we conclude that any adversary does not 
detect which message is encrypted. The reason is, h(x*) is statistically close to U/ : since 
x* has high average-case min-entropy and /;(•) can extracts a random string from x*. 

5 Realization of Re-applicable LTDFs Based on DDH Assumption 

In this section, we describe the realization of re-applicable LTDFs from the DDH as¬ 
sumption. We modify the construction proposed by Peikert and Waters. 

We now describe LTDFs based on the DDH assumption. We modify the construction 
as the proposed by Peikert and Waters on two points. One is a division of one function 
generation algorithm into two algorithms. The other is a change on an encrypting matrix 
in the injective function generator. Peikert and Waters proposed the function-generation 
algorithm which creates a function index as a ciphertext of a matrix. This ciphertext 
is encrypted with the ElGamal encryption on matrices. When generating an injective- 
function index, it encrypts the matrix I — ( g SiJ )i,j on G' !X ", where d LJ is the Kronecker 
delta. When generating a lossy-function index, it encrypts the matrix 0 = (e) L j on G” x ”, 
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where e is the identity in G (i.e. e — g°). We also do the same for generating a lossy- 
function index, but we do another procedure for generating an injective-function index. 
We define a set of tags T as a group G and a special element ri os as the identity e e G. 
We use a matrix M = on Q" x ", where r is any element in G. When generating 

a injective-function index, we set r with r + ri os . When generating a lossy-function 
index, we set r = ri os . 

- Generation of a public parameter. A parameter generator ParGen first executes Q, 
and Q outputs a tuple ( p , G, g). It next selects random numbers r \,..., r„ 6r Z p , 
then makes a public parameter C\ as 


'Cl' 


'S'" 




,Cn. 


-g r "i 


- Generation of function indices. A function generator LossyGen takes as input C i 
and a tag r, where Cj is the public parameter and r is an element in G. (We note 
that if r = e, it means execution of the lossy mode, otherwise, execution of the 
injective mode.) It first selects random elements zi, Z 2 , ■ ■ ■, Z n £r Z p , then computes 
a function index as 

c ln \ c\ l ■ t ■ ■ ■ c\ n 

l ’ n 1 1 ( Zi ■ c ■ 

. . . . I cy = c/-r ( 

• ' _ ■ | aj = c ‘ otherwise. 

C nn J C„ ■ ■■ C 1 ," ' TJ 

The function index consists of (Cj, C 2 ). A trapdoor consists of the random elements 

Z= (Zl,...,Z«). 

- Evaluation algorithm. An evaluation algorithm LossyEval takes as input 
(Ci,C 2 ,x), where (Ci,C 2 ) is a function index, and x = (x\,..., x n ) e {0,1}" is 
an «-bit input interpreted as a vector. 

It evaluates the linear product of jc and C 1 : That is, y\ = xCj = [1"-| (c/l^.Next, 
it evaluates the product of x and C 2 : That is, y 2 = xC 2 — ([ j, c x j,- • • .fj " = , c x ‘)= 
mu cf Xi )r x> ,• ■ ■ ,(n;ii C z f x OT^). Finally, it outputs (yi,y 2 )- 

- Inversion algorithm. An inversion algorithm Lossylnv takes as input (td, r, (yi,j 2 ))> 
where trapdoor information td consists of z = (zi,.. ■,z n ), t, which is an element 
in G\{e), and y 2 = (yi,i, ■ ■ • ,yi,n ) s G lx " . It computes that w = (y 2 ,1 • y[ z ',yi,i • 
y~ Z2 , ■ ■ ■ , >’ 2 ,,, • y[ z "). Then, if y-th element of w is the identity element of G, then it 
sets xj = 0, else if j-th element of w is t then it sets x 2 = 1; otherwise, it output ±. 
Finally, it outputs x = (xi,x 2 ,... ,x n ). 



We can show that the above four algorithms satisfy injectivity, (n, n - log p)-lossiness, 
and indistinguishability based on the DDH assumption. These proofs are nearly similar 
to the proof of Peikert and Waters IfTTl : therefore, we omit them in this paper. 

Next, we show that the above algorithms satisfy the definition of re-applicable 
LTDFs. 
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Re-applying with respect to function indices: Let r, and r ; be tags different from ti os 
in T . Let (s;, tel,) and (sj, tdj) be outputs from LossyGenCr,) and LossyGen(r ; ). 
We now define an algorithm Re Index, which takes td, and tdj as inputs and outputs 

•?;«-> j = tdj- tdj = (z' - Zu 4 - Z2> • • •, 4 - Zb) = (Zl,i« y, • ■ • > Zn,i« ;)• 

We next define an algorithm ReEval. The algorithm ReEval takes as in¬ 
put (Si„j,(yuy 2 )), where 0 t,T 2 ) = (yu(y 2 ,i,y 2 , 2 ,--- ,y 2 .n)) is an output from 
LossyEval(s,-, jc). It computes that y' 2 = (y 2 l ,y' 22 ,...,y' 2n ) = (y 2j i • y^*"’,yi,i ■ 
yf‘~ J , ■ ■ • ,y 2 ,„ ■ y^). Then, it outputs (yi,y 2 ). 


For the above elements, we can describe that (yi,y 2 ) — (g^ 1 x,r ‘, (g Zl ^ ,=l x,r ' ■ 
Tf 1 ,--- ,g z "^ Xin ■ )) and (yuy 2 ) = ,g z " Z " =1 

Therefore, we have w' = (t X i , • • • ,r*“) in the algorithm Lossylnv(fc//, r,, (yi,y^)). 
That is, we obtain that jc = Lossylnv(fd ; , r,, (yi,y 2 ))- 
Generating proper outputs: Let r,-, r ; , (s,-, tdj), (Sj, tdj), s^j jc, and (yi,y' 2 ) be defined 
similarly to the above paragraph. We call (yi, j' 2 ) a proper output for x, Tj, r ; , and 
,y ; , if they satisfy (yi ,y 2 ) = ReEval(,y,„ 2 , LossyEvaK.y,, jc)) for some jy made from 
a tag Tj and We can uniquely describe a proper (yi ,y 2 ) as (g Zl ^"=i x ‘ r ‘ ■ 

t X i , ■ ■ ■ ,g z "Z'L lWi . T -'»)) from the above algorithms, where tdj = (z\, ■ • ■ ,z' n )■ 

We define a new algorithm PrivReEval, which takes jc, t ( , t ; -, and ,v ; as input, 
where jc = (xi, • • • ,jc„) is n-bits input. It computes (yi,y 2 ) <— LossyEvaK.y,, x). 
It makes y 2 from y 2 in the following process: for each i e [l,n], if Xj = 1 then 
y 2( . <— y 2 jTj l Tj, else y 2j( - <— y 2 j, where y 2 j and y 2( are the i-th elements of y 2 
and y 2 . Finally it outputs (yi,y 2 ). The algorithm PrivReEvalfx,r,,r ; , ,y,-) always 
computes a proper output for jc, t,-, t,-, and s,-. The reason is that, from an out¬ 
put (yij 2 ) = (gZU x i r i t (g z 'iZUxi r i . T x ',■■■ ,g z *Z"= iw . T *")), we verify (yi,y 2 ) = 
(g2i=i*f'v > (gz'iZi= i -*i r f . r^ 1 , • • • , g z * 2,=i x i r i . t x ")) that tj is replaced with r,. This 
means that PrivReEval(-, r,-, Tj, Sj) is equivalent to ReEval(^v, 2 , LossyEval(s,-, •)) 
as a function. 

Transitivity: We define an algorithm Trans, which takes s,-« 2 , and outputs - 
= (fc4 - tdi ) - (tdj - tdj) = tdx - tdj = £/<->*. That is, Trans( 

Statistical indistinguishability of the fake key: Now, we define an algorithm 
FakeKey, which takes a function index .y, and a tag r, and makes a fake index s ; -, a 
fake re-key .y,- M/ . and a fake tag Tj. The fake key generator FakeKey takes as input 
Si = (Ci, C 2 ) and r, e G, where (Ci, C 2 ) is a function index. It then selects a random 


element t e G. It next chooses a random number .y,„ 7 - = (zi,,<-> 2 ,.. .,z n ,i*->j) e« Z”, 
and makes a new matrix C 2 as follows. 


( c u • Cj • t ■ ■ ■ 


C\ = 


c„,l 




c k.e = Ck ’ e ' c * f if k = A 

c' k f - c w • otherwise, 


where q is the A: entry of Ci, and is the (A',/) entry of C 2 . Finally, it output 
L — (Gi, C 2 ), Sji-tj — (zijt-tj, ..., Zn.i<->j), and Tj — t\ ■ t. 

From the abode description, outputs of Fake Key (by,, Tj) and the proper index have 
the same distribution. The reason is that, when s, and t, is made from the proper 
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way, we can describe C" by 


/ s n(zi+zu»j) . • 


CU 


»(Zl+ZU«/) 


gri(z„+z„,i„j) \ 


"(Zn+Z. J- . f 


= p'ife+W • T , • f ifjfc = 




- a rk(zt+zej„j) 


otherwise. 


where (zi,---,z„) is the trapdoor of .y,. This means that, conditioned on 
f A (r,) _l , the distribution of {Ti,Tj,(Sj,tdj),(Sj,tdj), s^j} is identical to 
{t;, Tj, LossyGen(r ( ), LossyGen(r/), Relndex(ft/,, tdj)}. That is, distributions be¬ 
tween them are statistically indistinguishable since the probability of t — (r,) -1 
is 1 /p. 

Generation of injective functions from lossy functions: Next, we consider FakeKey 
which St — (Ci,C 2 ) is output from LossyGen(ri os ). In this case, we can describe 
C' 2 as 

/ gnfei+zij«f) . f ... gii(z„+Znj»j) \ 


c; = 


gr n (Zl+ZlJ"j) 


gr„(z„+Z„,i^j) 


■t) 


where (zi, ■ • • ,z„) are logarithms between C i and C 2 . Then, assuming that t A e, 
LossyEval(s 7 -, •) represents an injective function f S J . This probability is 1 - \/p 
which is overwhelming in the security parameter A. 
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Abstract. We propose new and improved instantiations of lossy trap¬ 
door functions (Peikert and Waters, STOC ’08), and correlation-secure 
trapdoor functions (Rosen and Segev, TCC ’09). Our constructions widen 
the set of number-theoretic assumptions upon which these primitives can 
be based, and are summarized as follows: 

— Lossy trapdoor functions based on the quadratic residuosity assump¬ 
tion. Our construction relies on modular squaring, and whereas 
previous such constructions were based on seemingly stronger as¬ 
sumptions, we present the first construction that is based solely on 
the quadratic residuosity assumption. 

— Lossy trapdoor functions based on the composite residuosity assump¬ 
tion. Our construction guarantees essentially any required amount 
of lossiness, where at the same time the functions are more efficient 
than the matrix-based approach of Peikert and Waters. 

— Lossy trapdoor functions based on the d-Linear assumption. Our 
construction both simplifies the DDH-based construction of Peikert 
and Waters, and admits a generalization to the whole family of d- 
Linear assumptions without any loss of efficiency. 

— Correlation-secure trapdoor functions related to the hardness of syn¬ 
drome decoding. 

Keywords: Public-key encryption, lossy trapdoor functions, correlation- 
secure trapdoor functions. 


1 Introduction 

In this paper, we describe new constructions of lossy trapdoor functions and 
correlation-secure trapdoor functions. These primitives are strengthened variants 
of the classical notion of trapdoor functions, and were introduced with the main 
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goal of enabling simple and black-box constructions of public-key encryption 
schemes that are secure against chosen-ciphertext attacks. At a high level, they 
are defined as follows: 


Lossy trapdoor functions [25]: A collection of lossy trapdoor functions con¬ 
sists of two families of functions. Functions in one family are injective and 
can be efficiently inverted using a trapdoor. Functions in the other family 
are “lossy,” which means that the size of their image is significantly smaller 
than the size of their domain. The only computational requirement is that a 
description of a randomly chosen function from the family of injective func¬ 
tions is computationally indistinguishable from a description of a randomly 
chosen function from the family of lossy functions. 

Correlation-secure trapdoor functions [26] : The classical notion of a one¬ 
way function asks for a function that is efficiently computable but is hard 
to invert given the image of a uniformly chosen input. Correlation security 
generalizes the one-wayness requirement by considering k -wise products of 
functions and any specified input distribution, not necessarily the uniform 
distribution. Given a collection of functions T and a distribution C over k- 
tuples of inputs, we say that T is secure under C-correlated inputs if the 
function (/i(xi),..., fk{xk)) is one-way, where /i,..., fk are independently 
chosen from T and (aq,..., Xk) are sampled from C. 


Lossy trapdoor functions were introduced by Peikert and Waters [25] , who 
showed that they imply fundamental cryptographic primitives, such as trap¬ 
door functions, collision-resistant hash functions, oblivious transfer, and CCA- 
secure public-key encryption. In addition, lossy trapdoor functions have already 
found various other applications, including deterministic public-key encryption 
[2], OAEP-based public-key encryption [13, “hedged” public-key encryption for 
protecting against bad randomness [Tj, security against selective opening at¬ 
tacks [2] , and efficient non-interactive string commitments [221 . 

The notion of correlation security was introduced by Rosen and Segev [25] , 
who showed that any collection of injective trapdoor functions that is one-way 
under a natural input distribution can be used to construct a CCA-secure public- 
key encryption scheme Q They showed that any collection of lossy trapdoor func¬ 
tions that are sufficiently lossy is in fact also correlation-secure. This result was 
recently refined by Mol and Yilek :!j5] who showed that even lossiness of any 
polynomial fraction of a single bit suffices. 

These applications motivate us to investigate new constructions of lossy and 
correlation-secure functions. Such constructions would enable us to widen the 
basis upon which one can achieve the above cryptographic tasks in a simple and 
modular way. 


1 Any distribution where (an,... ,Xk) are (1 — e)fc-wise independent, for a constant 
e < 1, can be used in their framework. In particular, this includes the case where x\ 
is uniformly distributed and x\ = ■ ■ ■ = Xk- 
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1.1 Our Contributions 

We propose new and improved constructions of lossy and correlation-secure trap¬ 
door functions based on well-established number-theoretic assumptions (some of 
which were previously not known to imply either of the primitives). By directly 
applying the results of we obtain new CCA-secure public-key encryp¬ 

tion schemes based on these assumptions. Concretely, we present the following 
constructions: 

1. Lossy trapdoor permutations based on the quadratic residuosity assumption. 
Our construction relies on Rabin’s modular squaring function and is based 
solely on the quadratic residuosity assumption. More precisely, the function 
is defined as f(x) = x 2 • S riS (x) mod N, where N = PQ is an RSA modulus 
and 5 r ,s(') is a function indexed by two public elements r, s £ Zjv serving 
two independent purposes. First, it extends the modular squaring function 
to a permutation over Second, f(x) loses the information about the sign 
of x if and only if s is a quadratic residue. Therefore, under the quadratic 
residuosity assumption / has one bit of lossiness. We note that although a 
function with only one bit of lossiness (or, more generally, with only a non- 
negligible amount of lossiness) is not necessarily a (strong) one-way function, 
it nevertheless can be used as a building block for constructing even a CCA- 
secure public-key encryption scheme (see [19126) 1. 

2. Lossy trapdoor functions based on the composite residuosity assumption. Our 
construction is based on the Damgard-Jurik encryption scheme [5] with ad¬ 
ditional insights by Damgard and Nielsen [9110] . The Damgard-Jurik scheme 
is based on computations in the group Zjy»+i, where N = PQ is an RSA 
modulus and s > 1 is an integer (it contains Paillier’s encryption scheme [23 
as a special case by setting s = 1). At a high level, each function is described 
by a pair ( pk , c), where pk is a public key for the encryption scheme, and c is 
either an encryption of 1 (injective mode) or an encryption of 0 (lossy mode). 
By using the homomorphic properties of the encryption scheme, given such 
a ciphertext c and an element x, it is possible to compute either an encryp¬ 
tion of x in the injective mode, or an encryption of 0 in the lossy mode. We 
note that this construction was concurrently and independently proposed by 
Boldyreva et al. [3], We also give an “all-but-one” version of the construction. 

3. Lossy trapdoor functions based on the d-Linear assumption. Our construction 
both simplifies and generalizes the DDH-based construction of Peikert and 
Waters [25j Section 5]. (Recall that DDH is the 1-Linear assumption.) Let 
G be a finite group of order p and choose an n x n matrix M over F p that 
has either rank d (lossy mode) or rank n (injective mode). We “encrypt” 
M = (a,ij) as the matrix g M = (g aij ) £ G nxn , where g is a generator of G. 
If a? is a binary vector of length n, then given g M we can efficiently evaluate 
the function /m(^) = g Mx £ G n . If M has rank n, then given M we can 
efficiently invert /m on the image of {0,1}". On the other hand, if M has 
rank d and p < 2 n / d , then / is lossy. The d-Linear assumption implies that 
the lossy and injective modes cannot be efficiently distinguished. We also give 
an “all-but-one” version of the function J'm based on the DDH assumption. 
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4. Correlation-secure trapdoor functions based on the hardness of syndrome 
decoding. Our construction is based on Niederreiter’s coding-based encryp¬ 
tion system [21 which itself is the dual of the McEliece encryption sys¬ 
tem | TB] . Our trapdoor function is defined as f(x) = Hx, where H is a 
binary (n — k) x n matrix (of a certain distribution that allows for embed¬ 
ding a trapdoor) and x is bit string of small Hamming weight. We show that 
the function’s correlation security is directly implied by a result of Fischer 
and Stern [T2] about the pseudorandomness of the function /. Interestingly, 
the related McEliece trapdoor function (which can be viewed as the dual 
of the Niederreiter function) is not correlation-secure @ It is however pos¬ 
sible to extend the framework of correlation security in a natural way to 
obtain a direct construction of a CCA-secure encryption scheme from the 
McEliece trapdoor function. This was recently demonstrated by Dowsley et 
al jH] (who proposed the first coding-based encryption scheme that is CCA- 
secure in the standard model) and, for the related lattice case, independently 
by Peikert [2T] and Goldwasser and Vaikuntanathan [IT]. Our contribution 
is to show that the Niederreiter function admits a simple construction of 
correlation-secure trapdoor functions based on the same security assump¬ 
tions as [llj £1 The resulting CCA-secure encryption scheme is as efficient as 
the one from m- 


1.2 Related Work 

Most of the known constructions and applications of lossy and correlation-secure 
trapdoor functions are already mentioned above; here we include a few more. 
Besides their construction based on DDH, Peikert and Waters [25] also present 
a construction of lossy trapdoor functions based on the worst-case hardness of 
lattice problems. The construction does not enjoy the same amount of lossiness as 
their DDH-based one, but it still suffices for their construction of a CCA-secure 
public-key encryption scheme. The worst-case hardness of lattice problems is also 
used by Peikert j2T[ and by Goldwasser and Vaikuntanathan [IT] to construct 
a CCA-secure encryption scheme using a natural generalization of correlation- 
secure trapdoor functions. 

Kiltz et al. m show that the RSA trapdoor permutation is lossy under the $- 
Hiding assumption of Cachin et al. [6]. (Concretely, it has log 2 (e) bits of lossiness, 
where e is the public RSA exponent.) Furthermore, they propose multi-prime 
hardness assumptions under which RSA has greater lossiness. 

In concurrent and independent work, Mol and Yilek |lffj propose a lossy trap¬ 
door function based on the modular squaring function. Though this construction 

2 The McEliece trapdoor function is defined as f' H (x,e ) := Hx ® e, where H is a 
binary k x n matrix, a; is a fc-bit string and e is a error vector of small Hamming 
weight. Given H i, H 2 and two evaluations 3/1 = H\x © e and 3/2 = H 2 X © e one can 
reconstruct the unique x by solving (Hi © H 2 )® = 3/1 © 1/2 for x. 

3 We remark that our construction of a correlation-secure trapdoor function from 
coding theory does not carry over to the lattice case since the “dual” of the one-way 
function used in 1241 141 is not injective. 
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is related to ours, its security is based on the stronger assumption that a ran¬ 
dom two-prime RSA modulus is indistinguishable from a random three-prime 
RSA modulus. In another concurrent and independent work, Hemenway and 
Ostrovsky [TS] generalize the framework of Peikert and Waters [25] to rely on 
any homomorphic hash proof system, a natural generalization of hash proof sys¬ 
tems introduced by Cramer and Shoup [7]. Hemenway and Ostrovsky then show 
that homomorphic hash proof systems can be constructed based on either the 
quadratic residuosity assumption or the composite residuosity assumption. Their 
approach is significantly different than ours, and the resulting constructions seem 
incomparable when considering the trade-off between efficiency and lossiness. 

1.3 Paper Organization 

The remainder of this paper is organized as follows. In Section [5] we review the 
definitions of lossy and correlation-secure trapdoor functions. In Sections [3] 01 
and 0 we present our constructions that are based on the quadratic residuosity 
assumption, the d-Linear assumption, and the hardness of syndrome decoding, 
respectively. Due to space constraints, the construction based on the composite 
residuosity assumption is only given in the full version [13] . 

2 Preliminaries 

2.1 Lossy Trapdoor Functions 

A collection of lossy trapdoor functions consists of two families of functions. Func¬ 
tions in one family are injective and can be efficiently inverted using a trapdoor. 
Functions in the other family are “lossy,” which means that the size of their im¬ 
age is significantly smaller than the size of their domain. The only computational 
requirement is that a description of a randomly chosen function from the family 
of injective functions is computationally indistinguishable from a description of 
a randomly chosen function from the family of lossy functions. 

Definition 2.1 (Lossy trapdoor functions). A collection of (n, t)-lossy trap¬ 
door functions is a 4,-tuple of probabilistic polynomial-time algorithms (Go, Gi, F, 
F” 1 2 3 * 5 ) such that: 

1 . Sampling a lossy function: Go(l") outputs a function index a £ {0,1}*. 

2. Sampling an injective function: Gi(l") outputs a pair (a , r) £ {0,1}* x 
(0, 1}*. (Here a is a function index and t is a trapdoor.) 

3. Evaluation of lossy functions: For every function index a produced by 
Go, the algorithm F(cr, •) computes a function f a : {0,1}" i—> {0,1}*, whose 
image is of size at most 2 n ~ f . 

Evaluation of injective functions: For every pair (a, t) produced by Gi, 
the algorithm F(er, •) computes an injective function f a : {0,1}" 1 —> {0,1}*. 

5. Inversion of injective functions: For every pair produced by Gi and 
every x £ {0,1}", we have F _1 (r, F(cr, a;)) = x. 
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6 . Security: The two ensembles {a : a <— Go(l")} ng N and '■ (o', t) 
Gi(l")} are computationally indistinguishable. 

Note that £ can be a function of n. Note also that we do not specify the output 
of F _1 on inputs not in the image of f a . 

A collection of all-but-one lossy trapdoor functions is a more general primitive. 
Such a collection is associated with a set B , whose members are referred to as 
branches. (If B = {0,1} then we obtain the previous notion of lossy trapdoor 
functions.) The sampling algorithm of the collection receives an additional pa¬ 
rameter b* £ B , and outputs a description of a function /(•,•) together with a 
trapdoor r and a set of lossy branches (3. The function / has the property that 
for any branch b ^ (3 the function /(&, •) is injective (and can be inverted using 
t), and the function /(&*, •) is lossy. Moreover, the description of / hides (in a 
computational sense) the set of lossy branches (3. 

Our definition is slightly more general than that of Peikert and Waters [25) 
Section 3.2], which allows only one lossy branch (i.e., (3 = {&*}). We allow pos¬ 
sibly many lossy branches (other than b*), and require that given a description 
of a function and b* it is computationally infeasible to find another lossy branch. 
The proof of security of the Peikert-Waters CCA-secure public-key encryption 
scheme [23] Section 4.3] can easily be adapted to our more general context. 
(We are currently not aware of other applications of all-but-one lossy trapdoor 
functions). 

Definition 2.2 (All-but-one lossy trapdoor functions). A collection of 
(n, t)-all-but-one lossy trapdoor functions is a 4-tuple of probabilistic polynomial¬ 
time algorithms (B, G, F, F _1 ) such that: 

1. Sampling a branch: B(l n ) outputs a value b £ {0,1}*. 

2. Sampling a function: For every value b produced by B(l n ), the algorithm 
G(l",6) outputs a triple (a, r, (3) £ {0,1}* x {0,1}* x {0,1}* consisting of a 
function index a, a trapdoor t, and a set of lossy branches (3 with b* £ (3. 

3. Evaluation of lossy functions: For every value b* produced by B(l n ) and 
for every ( cr,T,f3 ) produced by G(l n ,&*), the algorithm F(cr, &*,•) computes a 
function f a j,* : {0,1}" i—> {0,1}*, whose image is of size at most 2 n ~ t . 

4- Evaluation of injective functions: For any b* and b produced by B(l ra ) 
and for every (a, r, 0) produced by G(l n ,&*), if b £ (3, then the algorithm 
F(cr, b 1 •) computes an injective function f a> b : {0,1}" —> {0,1}*. 

5. Inversion of injective functions: For any b* and b produced by B(l n ) and 
for every (<j,t,/ 3) produced by G(1 n ,b*), if b £ (3 then we have 

F _1 (r, b, F(ct, b, x)) = x. 

6 . Security: For any two sequences {(6*,6„)} n£N such that 6* and b n are 
distinct values in the image of B(l ra ), the two ensembles {a : (a,r,/3) <— 
G(1 ™,&*)} n6 N and {a : (a, r, 0 ) <— G(l ra ,6 ra )} ng N computationally indis¬ 
tinguishable. 
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7. Hiding lossy branches: Any probabilistic polynomial-time algorithm A 
that receives as input (er, b*), where b* <— B(l n ) and (cr, r,/3) «— G(l n ,&*), 
has only a negligible probability of outputting an element b G j.3 \ {&*} (where 
the probability is taken over the randomness of B, G, and A). 

2.2 Correlation-Secure Trapdoor Functions 

A collection of efficiently computable functions is a pair of algorithms T = (G, F), 
where G is a key-generation algorithm used for sampling a description of a func¬ 
tion, and F is an evaluation algorithm used for evaluating a function on a given 
input. The following definition formalizes the notion of a k-wise product , which 
is a collection Tk consisting of all fc-tuples of functions from T. 

Definition 2.3 (fc-wise product). Let T = (G, F) be a collection of efficiently 
computable functions. For any integer k, we define the k -wise product Tk = 
(Gfc,Ffc) as follows: 

— The key-generation algorithm G k on input l n invokes k independent in¬ 
stances of G(l n ) and outputs (ui,..., Ufc). That is, a function is sampled 
from Tk by independently sampling k functions from T. 

— The evaluation algorithm F k on input (cri,..., ak, x \,..., Xk) invokes F to 
evaluate each function Oi on Xi. I.e., Ffc(<7i,..., <Jk , x ±,..., Xk) = (F(ai, x\), 

• ■ •, F (<J k ,Xk)). 

A one-way function is a function that is efficiently computable but is hard to 
invert given the image of a uniformly chosen input. This notion extends natu¬ 
rally to one-wayness under any specified input distribution, not necessarily the 
uniform distribution. Specifically, we say that a function is one-way with respect 
to an input distribution T if it is efficiently computable but hard to invert given 
the image of a random input sampled according to T. 

In the context of k -wise products, a straightforward argument shows that for 
any collection T which is one-way with respect to some input distribution I, the 
k -wise product Tk is one-way with respect to the input distribution that samples 
k independent inputs from T. The following definition formalizes the notion of 
one-wayness under correlated inputs, where the inputs for Tk may be correlated. 

Definition 2.4 (One-wayness under correlated inputs). Let T = (G, F) 

be a collection of efficiently computable functions with domain {D n } n and let 
C be a distribution where C(l ra ) is distributed over D * = D n x • • • x D n for some 
integer k = k{n). We say that T is one-way under C-correlated inputs if Tk is 
one-way with respect to the input distribution C. 

For the special case that distribution C is the uniform fc-repetition distribution 
(i.e., C samples a uniformly random input x G D n and outputs k copies of 
x), we say that T is one-way under k-correlated inputs. Rosen and Segev [2B' f 
Theorem 3.3] show that a collection of (n, £)-lossy trapdoor functions can be 
used to construct a collection T that is one-way under fc-correlated inputs for 
any k < ^ ■ 
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3 A Construction Based on the Quadratic Residuosity 
Assumption 

Our construction is based on the modular squaring function x i—> x 2 mod TV, 
where TV = PQ for prime numbers P = Q = 3 mod 4 (i.e., Blum integers). 
This is a 4-to-l mapping on 7j* n , and the construction is obtained by embedding 
additional information in the output that reduces the number of preimages to 
either 2 (these are the lossy functions) or 1 (these are the injective functions) 
in a computationally indistinguishable manner. Although this results in one bit 
of lossiness when the functions are defined over 7j* n , all lossy trapdoor functions 
in a collection are required to share the same domain (i.e., the domain should 
depend only on the security parameter). We overcome this difficulty with a 
simple domain extension, which results in lossiness of log 2 (4/3) bits. 

For any odd positive integer TV, we denote by JSjv : Z —> {—1, 0,1} the Jacobi 
symbol mod TV. We define functions h,j : Z —> {0,1} as follows: 



1, if x > TV/2 
0, if x < TV/2 



We define h and j on Z^r by representing elements of Zjv as integers between 0 
and TV — 1. 

Fact 3.1 Let TV = PQ where P = Q = 3 mod 4, and let y £ Z* N be a 
quadratic residue. Denote by {±cco,±a:i} the distinct solutions of the equation 
x 2 = y mod TV. Then , JSp(— 1) = JSq(—1) = —1 and therefore 

1. JSat(:z;o) = JSat(—xo) and JSat(:ci) = JSjv(— x{). 

2 . JSat(xo) = —JSat(;ti). 

In particular , the four square roots of y take all four values of (h(x),j(x)). 

The construction. We define a 4-tuple T = (Go, Gi, F, F^ 1 ) (recall Definition 
ED as follows: 

1. Sampling a lossy function: On input 1" the algorithm Go chooses an 71 - 
bit modulus TV = PQ, where P = Q = 3 mod 4 are n./2-bit prime numbers. 
Then it chooses random r £ 7T N such that JSAr(r) = —1, and a random 
s £ T,* n such that JSat(s) = 1 and s is a quadratic residue. The function 
index is a = (TV, r, s). 

2. Sampling an injective function: On input l n the algorithm Gi chooses 
an n-bit modulus TV = PQ, where P = Q = 3 mod 4 are n/2-bit prime 
numbers. Then it chooses random r £ X* N such that JSjv(r) = —1, and a 
random s £ 1T N such that JSat(s) = 1 and s is a quadratic non-residue. The 
function index is a = (TV, r, s), and the trapdoor is r = (P, Q). 
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3. Evaluation: Given a function index er = ( N,r,s ) and x £ {0,1}", the 
algorithm F interprets x as an integer in the set {1,..., 2”} and outputs 

„ f x 2 ■ • s h ^ mod N. if 1 < x < N 

fN,rA*) = [ Xf if N < X <2 n 

4. Inversion: Given a description of an injective function er = (N, r, s ) together 
with its trapdoor r = ( P,Q ) and y = fN,r,s{x), the algorithm F' 1 retrieves 
x as follows. If N < y < 2", then the algorithm outputs y. Otherwise, 

(a) Find j(x) by computing JSjv(/jv,r,s(^)) (note that JSjv(/jv,r,s(aO) = 
JS n(x))- Let y' = yr~^ x \ 

(b) Find h(x) by checking whether y' is a quadratic residue mod N (note 
that h(x) = 1 if and only if y' is not a quadratic residue). Let y" = 

yl s~h(x) 

(c) Find all square roots of y" in Zjv, and output the one that agrees with 
both j(x) and h(x). (We use Fact 13.11 if y" £ Z* N , and note that if 
1 < gcd (y", N ) < N, then y" has two square roots that are negatives of 
each other.) 

We now prove that the above construction is indeed lossy based on the quadratic 
residuosity assumption. Let Jn = {x £ Z* N : JSat(^) = 1}, and let Qn be the 
subgroup of squares in Z* N . Then the quadratic residuosity assumption states 
that the two distributions obtained by sampling uniformly at random from Qn 
and from Jn \ Qn are computationally indistinguishable. 

Theorem 3.2. Under the quadratic residuosity assumption, T is a collection of 
(n, log 2 (4/3)) -lossy trapdoor functions. 

Proof. First, it follows from the correctness of the inversion algorithm that Gi 
outputs permutations on the set {1,...,2"}. Next, we claim that Go outputs 
functions that are 2-to-l on the set {1,..., iV — 1}. Suppose y £ Qn- Since s is a 
quadratic residue, Fact 13.II implies that for each (y, l) £ {0, l} 2 there is an x v , L 
satisfying 

x l,i = y s ~ r1 ’ K x v,i-) = l h j{ x vx) = L - 
Then for each y £ {0,1} we have fN,r,s{x v ft) = y and fN,r,s(x Vt 1 ) = ry. Thus 
each element in the set QjvUrQv has at least two preimages in Z* N , and since this 
set has cardinality half that of Z* N we deduce that fN,r,s is 2-to-l on Z* N . A similar 
argument shows that every square in the group Xp = {x £ Zn '■ gcd(x, N) = P} 
has two preimages in Xp, and the same for Xq. Since {1,..., TV — 1} = Z* N U 
Xp U Xq, the function /v.r.s is 2-to-l on this whole set. 

Since N is an n-bit modulus (i.e., 2 n ~ 1 < N < 2 n ), the lossy functions are 2-to- 
1 on at least half of their domain, which implies that their image is of size at most 
3/4 • 2" = 2 ra-log2 ( 4 / 3 / In addition, descriptions of lossy functions and injective 
functions differ only in the element s, which is a random element of the subgroup 
of Z* N with Jacobi symbol 1 that is a quadratic residue in the lossy case and a 
quadratic non-residue in the injective case. Therefore, the quadratic residuosity 
assumption implies that lossy functions are computationally indistinguishable 
from injective functions. □ 
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4 A Construction Based on the d-Linear Assumption 

The d-Linear assumption [1W7] is a generalization of the decision Diffie-Hellman 
assumption that may hold even in groups with an efficiently computable d-linear 
map. The 1-Linear assumption is DDH, while the 2-Linear assumption is also 
known as the Decision Linear assumption [4j. The assumption is as follows: 

Definition 4.1. Let d > 1 be an integer, and let G be a finite cyclic group of 
order q. We say the d-Linear assumption holds in G if the distributions 

{(fll) ■ ■ ■ i9di di 1 ) • • -,g d d ,h, h ri+ ' +rd ) : gx,... ,g d ,h A G, n,...,r d £- Z q } , 
{(gi,---, 9 d,g[ 1 ,---,g r d d ,h,h s ) : gi,... ,g d ,h A G, n ,..., r d , s A Z 9 } 

are computationally indistinguishable. 

For any d > 1, the d-linear assumption implies the (d + l)-linear assumption [IB, 
Lemma 3]. 

Peikert and Waters [23 Section 5] give lossy and all-but-one lossy trapdoor 
functions based on the DDH assumption. In the Peikert-Waters construction, the 
function index is an ElGamal encryption of an n x n matrix M which is either 
the zero matrix (lossy mode) or the identity matrix (injective mode) using a 
finite cyclic group G of order p. The DDH assumption in G implies that these 
two encryptions cannot be distinguished. The construction can be generalized to 
d-linear assumptions using generalized ElGamal encryption, but such schemes 
are less efficient since ElGamal based on the d-Linear assumption produces d +1 
group elements per ciphertext (see e.g. mi 

Our construction is based on the following basic observation from linear alge¬ 
bra: if M is an n x n matrix over a finite field F p and al is a length-n column 
vector, then the map /m : x Mx has image of size p Rk ( M \ If we restrict 
the domain to only binary vectors (i.e., those with entries in {0,1}), then the 
function /m is injective when Rk(M) = n, and its inverse can be computed by 
f~M '■ V l—> M~ x y. If on the other hand we have Rk(M) < n/log 2 (p), then /m 
is not injective even when the domain is restricted to binary vectors, since the 
image is contained in a subgroup of size less than 2™. 

By performing the above linear algebra “in the exponent” of a group of order 
p , we can create lossy trapdoor functions based on DDH and the related d-Linear 
assumptions. In particular, for any n the size of the function index is the same 
for all d. 

We will use the following notation: we let F p denote a field of p elements and 
Rkd(F p x ") the set ofnxn matrices over F p of rank d. If we have a group G of 
order p , an element g £ G, and a vector x = (x \,..., x n ) £ F p , then we define 
g x to be the column vector ( g Xl ,... ,g Xn ) £ G". If M = (a,j) is annxn matrix 
over F p , we denote by g M the n x n matrix over G given by ( g aij ). Given a 
matrix M = (a^) £ F p xn and a column vector g = {g i,... ,g n ) £ G n , we define 
g M by 

g M = (n".i s7‘ .rau«r). 
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Similarly, given a matrix S = (gij) GG nxn and a column vector x = (x ±,..., x n ) £ 
F”, we define S’” by 



With these definitions, we have ( g M ) x = {g x ) M = g( Mx '>. 

The construction. For any positive integer d and any real number e £ (0,1), 
we define a 4-tuple T = (Go, Gi, F, F _1 ) (recall Definition 12.Ill as follows: 

1. Sampling a lossy function: On input l n , the algorithm Go chooses at 
random a \en/d\-b\t prime p , a group G of order p, and a generator g of G. 
Then it chooses a matrix M Rk,j(Fp XTl ) and computes S = g M £ G" x ". 
The function index is a = S. 

2. Sampling an injective function: On input 1", the algorithm Gi chooses 
at random a |~en/cf|-bit prime p, a group G of order p , and a generator g of G. 
Then it chooses a matrix M Rk n (Fp X ”) and computes S = g M £ G" x ". 
The function index is a = S, and the trapdoor is r = (p, M). 

3. Evaluation: Given a function index S and x £ {0,1}™, we interpret £ as a 
binary column vector x = (x \,..., x n ) £ F£. The algorithm F computes the 
function fs(x) = SC 

4. Inversion: Given a function index S, a trapdoor r = ( g,M ), and a vector 
g £ G n , we define F _1 (r, g) as follows: 

(a) Compute h = (hi,..., h n ) <— g M 1 . 

(b) Let Xi = log g (hi) for i = 1,..., n. 

(c) Output x = (xi,..., x n ). 

Theorem 4.1. Suppose en > d. If the d-Linear assumption holds for G, then 
the above family is a collection of ( n , (1 — e)n)-lossy trapdoor functions. 

Proof. We first note that in the lossy case, when M is of rank d, the image 
of /s is contained in a subgroup of G n of size p d < 2 en . The condition en > d 
guarantees p > 3, so when M is of rank n the function /s is in fact injective. It 
is straightforward to verify that the inversion algorithm performs correctly for 
injective functions. Finally, by |2fll Lemma A. 1], the d-Linear assumption implies 
that the matrix S when M is of rank n is computationally indistinguishable from 
the matrix S when M is of rank d. □ 

Note that the system’s security scales with the bit size of p, i.e., as en/d. In 
addition, note that the discrete logarithms in the inversion step can be performed 
efficiently when a: is a binary vector. (Here we take advantage of the fact that 
the output of F~ l is unspecified on inputs not in the image of F.) 

We now describe the extension of the system to all-but-one lossy trapdoor 
functions, in the case where the parameter d in the above construction is equal 
to 1. Let I n denote the nx n identity matrix. For any real number e £ (0,1), we 
define a 4-tuple T = (Go, Gi, F, F” 1 ) (recall Definition 12.21) as follows: 
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1. Sampling a branch: On input l ra , the algorithm B outputs a uniformly 
distributed b £ {1,...,2L e ™J}. 

2. Sampling a function: On input 1” and a lossy branch b*, the algorithm G 
chooses at random a |~ en \-bit prime p, a group G of order p , and a generator 
g of G. Then it chooses a matrix A A Rki(Fp Xn ) Let M = A~ b*I n £ Fp Xn 
and S = g M £ G nxn . The function index is cr = S, the trapdoor is r = 
(g, M), and the set of lossy branches is (3 = {6*, b* — Tr(H)}. 

3. Evaluation: Given a function index S, a branch b, and an input x £ {0,1}”, 
we interpret a: as a binary column vector x = (xi,..., x n ). The algorithm F 
computes the function /s,h(x) = S x * g bx , where * indicates the component¬ 
wise product of elements of G n . 

4. Inversion: Given a function index S, a trapdoor r = ( g,M ), a branch b, 
and a vector g £ G ra , we define F _1 (r, b , g) as follows: 

(a) If M + bl n is not invertible, output _L. 

(b) Compute h = (hi,..., h n ) <— g ("+w») _1 . 

(c) Let Xi = log g (hi) for i = 1,..., n. 

(d) Output x = (xi,..., x n ). 

Theorem 4.2. Suppose en > 1. If the DDH assumption holds for G, then the 
above family is a collection of (n, (1 — e)n)-all-but-one lossy trapdoor functions. 

Proof. We first observe that if A is the rank 1 matrix computed by G(l",&*), 
then 

fs,b(x) = gW-M*. (4.1) 

We now verify each property of Definition 12.21 Properties (1) and (2) are imme¬ 
diate. To verify property (3), note that (14.11) implies that /s,6*(x) = g Ax . Since 
A has rank 1, the image of /s,b* is contained in a subgroup of G" of size p < 2 en . 

To check property (4), we observe that the condition en > 1 guarantees p > 3, 
so when A — (b* — b)I n is invertible the function fs,b is injective. The condition 
A—(b* — b)I n being not invertible is equivalent to {b* — b) being an eigenvalue of A. 
Since A has rank 1, its eigenvalues are 0 and Tr(T). Thus (b* —b) is an eigenvalue 
of A if and only if b £ (3, and /s,& is injective for all b qL j3. It is straightforward 
to verify that the inversion algorithm performs correctly whenever b qL /?, so 
property (5) holds. 

Properties (6) and (7) follow from the DDH assumption for G. We show 
property (6) by constructing a sequence of games: 

Gameo: This is the real security game. The adversary is given bo, &i, and g A ~ b ^ In 
for u) A {0,1} and A A Rki(F” x "), and outputs a bit to'. The adversary 
wins if to’ = u>. 

Gamei: The same as Gameo, except the challenge is g A ~ b ^ In for some full rank 
matrix A! A Rk„(Fp Xn ). 

Game 2 : The same as Gamei, except the challenge is g u ~ b ^ In for some uniform 
matrix U A F” x ™. 

Games: The same as Game 2 , except the challenge is g u . 
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Since the Game 3 challenge is independent of to, the advantage of any adversary 
playing Game 3 is zero. We now show that if the DDH assumption holds for G, 
then for i = 0,1, 2, no polynomial-time adversary A can distinguish Game^ from 
Game^-i-i with non-negligible advantage. 

* = 0: Any algorithm that distinguishes Gameo from Gamei can be used to distin¬ 
guish the distributions {g A : A A Rki(F” x ”)} and {g A : A! Rk n (F” x ")}. 
By 0 Lemma 1], any algorithm that distinguishes these distributions can 
solve the DDH problem in G. 

i = l: Since the proportion of full-rank matrices to all matrices in F™ xn is (p — 
1 )/p, even an unbounded adversary can distinguish Gamei from Game 2 with 
probability at most 1/p. 

i = 2: Since the matrix U is uniform in F^ xn , the matrix U—b u I n is also uniform 
in F” x ", so Game 2 and Game 3 are identical. 

We conclude that for any bo,bi, n0 polynomial-time adversary can win Gameo 
with non-negligible advantage. 

Finally, to demonstrate property (7) we show that any adversary A that pro¬ 
duces an element of (3 given S and b* can be used to compute discrete logarithms 
in G, contradicting the DDH assumption. Choose a matrix A Rki(F” x "), and 
let A\X) be the n x n matrix over F p [X] that is the matrix A with the first 
row multiplied by X. For any value X = t ^ 0, the matrix A(t) is uniformly 
distributed in Rki(F” xn ). 

Let (g 1 g t ) be a discrete logarithm challenge for G. For any b* we compute 
the matrix S = g A In and give (S, b*) to the adversary A. If the adversary 
outputs b £ j3 with b ^ b*, then we can compute Tr(A'(f)) since this is the only 
nonzero eigenvalue of A'(t). If an is the ith diagonal entry of A, this gives us 
an equation 

ant + ft22 + • • • + a n n = A. (4-2) 

Since an = 0 with probability 1/p, we can solve for t with all but negligible 
probability. □ 

If we choose any integer d > 1 and repeat the above construction with p a 
|"en/d]-bit prime and A a rank d matrix, then we expect to obtain an all-but- 
one lossy trapdoor function under the d-Linear assumption. Indeed, the proofs 
of properties (1)—(6) carry through in a straightforward way. However, the above 
proof of property (7) does not seem to generalize. In particular, the generalization 
of (14. 2|) is the equation det(A'(f) — A/„) = 0, which can be written as ut + v = 0 
for some (known) u,v € F p . When d = 1 the element u = an is independent 
of A, so we can conclude that it is nonzero with high probability; however when 
d > 2 this is not the case. We thus leave as an open problem the completion of 
the proof for d > 2. 

5 Correlated Input Security from Syndrome Decoding 

Our construction is based on Niederreiter’s coding-based encryption system [3T 
which itself is the dual of the McEliece encryption system [IS], 
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Let 0 < p = p(n) < 1 and 0 < 5 = d(n) < 1/2 be two functions in the 
security parameter n. We set the domain D n $ to be the set of all ro-bit strings 
with Hamming weight dn. Note that D n is efficiently samplable (see e.g. ezd- 
The Niederreiter trapdoor function T = (G, F, F 1 ) is defined as follows. 

— Key generation: On input l n the algorithm G chooses at random a non¬ 
singular binary pn x pn matrix S, a ( n, n — pn, dn)-linear binary Goppa code 
capable of correcting up to dn errors (given by its pnxn binary parity check 
matrix G), and a n x n permutation matrix P. It sets H := SGP, which is a 
binary pnxn matrix. The description of the function is a = H, the trapdoor 
is t = (S,G,P). 

— Evaluation: Given a description H of a function and x £ {0,1}" with 
Hamming weight dn, the algorithm F computes the function x) = Hx £ 
{0,l} pn . 

— Inversion: Given the trapdoor ( S,G,P ) and y = Hx, the algorithm F” 1 
computes S~ 1 y = GPx, applies a syndrome decoding algorithm for G to 
recover y = Px, and computes x = P~ l y. 

The Niederreiter trapdoor function can be proved one-way under the indistin- 
guishability and syndrome decoding assumptions which are indexed by the pa¬ 
rameters 0 < p < 1 and 0 < d < 1/2. 

Indistinguishability assumption. The binary pn x n matrix H output by 
G(l") is computationally indistinguishable from a uniform matrix of the 
same dimensions. 

Syndrome decoding assumption. The collection of functions which is de¬ 
fined as fu(x) := Ux for a uniform pn x n binary matrix U is one-way on 
domain D n s- 

Choosing the weight <5 to be close to the Gilbert-Warshamov bound is commonly 
believed to give hard instances for the syndrome decoding problem. The Gilbert- 
Warshamov bound for a ( n,k,dn ) linear code with d < 1/2 is given by the 
equation k/n < 1 — H 2 (d), where H 2 (d) := —d\og 2 d — (1 — <5)log 2 (l — <5). It 
is therefore assumed that the syndrome decoding assumption holds for all 0 < 
d < 1/2 satisfying H 2 (d) < p |12j . Note that one-wayness also implies that the 
cardinality of D n jj is super-polynomial in n. 

The following theorem was proved in [12] . 

Theorem 5.1 ( |l2j ). If the syndrome decoding assumption holds for p and d 
then the ensembles {(M,Mx) : M A {0,l} /5 " xn ; x A D ni s)} n evt and {( M,y ) : 
M A {0, l}P"Xn ; y £_ | 0j -Ljpr,} are computationally indistinguishable. 

This theorem implies that the Niederreiter trapdoor function is one-way under 
fc-correlated inputs. 

Theorem 5.2. Suppose p, d, and k are chosen such that p := pk < 1, and the 
indistinguishability and the syndrome decoding assumptions hold for parameters 
p and d. Then the Niederreiter trapdoor function is one-way under k-correlated 
inputs. 
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Proof. Fix a probabilistic polynomial-time adversary A that plays the security 
game for one-wayness under /c-correlated inputs. Define 

£ = Pr [A(H U ..., H k , tfiO),..., H k {x)) = *], 

where Hi <5- G(l n ) and x D n ^. We now exchange all the matrices Hi for uni¬ 
form matrices Ui of the same dimension. By the indistinguishability assumption 
and a hybrid argument, we have that 

| Pr [A(H U ..., H k ,H x (x),.. .,H k (x)) = x} 

- Pr [A{Ux, ...,U k , Ui(x ),..., U k {x)) = ®]| € negl(n). 

For p := pk , define the pn x n matrix U by concatenating the columns of the 
matrices Ui. Then the distributions (Ui ,..., U k , U\{x ),..., U k (x)) and (U, Ux) 
are identical. Since Hii^S) < p/k = p we can apply Theorem 15.II to obtain 

| Pr[„4([/, Ux) = x] — Pi[A(M,Up n ) = a;] | G negl(n), 

where Up n is a uniform bit-string in {0,1}^". Observing that Pv[A{U,Up n ) = 
x\ = l/\D ni s\ £ negl(n) (since the Niederreiter function is assumed to be one¬ 
way) implies that e is negligible. □ 

We remark that the above proof implies that the Niederreiter trapdoor function 
has linearly many hard-core bits, which greatly improves efficiency of the CCA- 
secure encryption scheme obtained by using the construction from fl2TT . 
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Abstract. Lossy Trapdoor Functions (LTDFs), introduced by Peikert 
and Waters (STOC 2008) have been useful for building many crypto¬ 
graphic primitives. In particular, by using an LTDF that loses a (1 — 
l/u;(logn)) fraction of all its input bits, it is possible to achieve CCA 
security using the LTDF as a black-box. Unfortunately, not all candidate 
LTDFs achieve such a high level of lossiness. In this paper we drastically 
lower the lossiness required to achieve CCA security, showing that an 
LTDF that loses only a noticeable fraction of a single bit can be used 
in a black-box way to build CCA-secure PKE. To show our result, we 
build on the recent result of Rosen and Segev (TCC 2009) that showed 
how to achieve CCA security from functions whose products are one¬ 
way on particular types of correlated inputs. Lastly, we give an example 
construction of a slightly lossy TDF based on the assumption that it is 
hard to distinguish the product of two primes from the product of three 
primes. 


1 Introduction 

Lossy Trapdoor Functions (LTDFs), recently introduced by Peikert and Wa¬ 
ters |15j . have proven to be a useful tool both for giving new constructions of 
traditional cryptographic primitives and also for constructing new primitives. 
Specifically, Peikert and Waters used LTDFs to construct one-way injective 
trapdoor functions, collision-resistant hash functions, CPA and CCA-secure 
encryptiorf], and more. More recently, LTDFs were used to construct deter¬ 
ministic PKE schemes secure in the standard model [3], as well as PKE schemes 
secure under selective-opening attack [T]. 

Informally, an LTDF is an injective trapdoor function with a function de¬ 
scription g that is (computationally) indistinguishable from the description g 
of another function that statistically loses information about its input. In other 
words, the function g is non-injective, with some images having potentially many 
preimages. We say an LTDF g (computationally) loses l bits if the effective range 
size of the indistinguishable function g is at most a 1/2^-fraction of its domain 

1 By CCA-secure we mean CCA2-secure. See [5] for a good overview of all the ways 
currently used to achieve CCA security. 

P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 296 j-311~| 2010. 
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size. LTDFs allow a useful and simple proof technique: in the honest execution 
of a protocol we use the injective function to get the correct functionality, while 
in the proof the “challenge” given to an adversary will use the lossy function. 
One can then do a statistical argument to complete the proof. 

Using LTDFs and this proof technique, Peikert and Waters show that an 
LTDF / with input size a polynomial n{ A) (where A is the security parameter) 
that loses w(log A) bits is one-way. This is easy to see since if an inverter is given 
g(x), where g is the indistinguishable lossy function, then there are on average 
2 ^(iogA) possible preimages; thus the adversary has only a negligible probabil¬ 
ity of outputting the correct one. Applying known results, these one-way TDFs 
immediately give CPA secure encryption using generic hardcore predicates |7j. 
Additionally, Peikert and Waters go on to show that LTDFs admit simple hard¬ 
core functions , resulting in efficient multi-bit encryption schemes. 

To achieve CCA security from LTDFs, Peikert and Waters then show that any 
LTDF with enough lossiness can be used to construct an all-but-one trapdoor 
function (ABO), which can then be used to achieve CCA security. “Enough” 
lossiness turns out to be almost all of the input bits, which can be difficult to 
achieve. Peikert and Waters get enough lossiness from a DDH-based construc¬ 
tion, however their lattice-based construction only loses a constant fraction of 
the input bits which turns out to be insufficient for the general construction. 
Thus, to get CCA security from lattice-based assumptions, they need to give a 
direct construction of an ABO. 

Since the original paper, more constructions of LTDFs have been proposed. 
Rosen and Segev [20] and Boldyreva, Fehr, and O’Neill [3] both gave a construc¬ 
tion based on the decisional composite residuosity (DCR) assumption, while 
Kiltz, O’Neill, and Smith [5] show that the RSA trapdoor permutation is lossy 
under the phi-hiding assumption of [3] . While the DCR-based LTDF has enough 
lossiness to construct ABOs and achieve CCA security, RSA only loses a con¬ 
stant fraction (less than one-half) of the input bits and thus cannot be used to 
construct an ABO using the general construction. 

Correlated Products. Rosen and Segev [2Tj recently generalized the ABO 
technique for achieving CCA security by giving a sufficient, strictly computa¬ 
tional assumption on the underlying TDFs. They called their notion one-wayness 
under correlated products. It is well known that for a polynomially-bounded w, 
sampling w functions independently from a family of one-way functions and 
applying them to independent uniform inputs still results in a one-way func¬ 
tion, and even amplifies the one-wayness. Rosen and Segev investigated the case 
when the inputs are not necessarily independent and uniform, but are instead 
correlated in some way. They went on to show how to get CCA security from 
a function family that is one-way with respect to specific distributions C w of w 
correlated inputs. Specifically, the distributions they use have the property that 
given any d < w of the inputs the entire input vector can be reconstructed. (We 
call such distributions (d, w)-subset reconstructible; see Section [3] for details.) 
The simplest such distribution happens when d = 1, which Rosen and Segev call 


298 


P. Mol and S. Yilek 


the w-repetition distribution. In this case, independently sampled functions are 
each applied to the same inputH. 

Of course, this notion is useful only if there exist TDFs that are one-way 
under such correlations. Rosen and Segev show that LTDFs with enough lossiness 
satisfy the requirements. The amount of lossiness they require turns out to be 
approximately the same amount needed by Peikert and Waters to go from an 
LTDF to an ABO. This amount, as we said, is more than any constant fraction 
of the input bits, ruling out numerous LTDFs. 

OUR Results. We extend the results of [T5] and [21] and show that only a 
noticeable fraction of a single bit of lossiness is sufficient for building IND-CCA 
secure encryption. Our results lower the required lossiness from a (1—l/w(log A))- 
fraction of all the input bits to just a 1/poly fraction of one bit. This solves 
an open problem from (the most recent version |14] of) |15j and additionally 
further confirms the usefulness of the correlated product formalization of Rosen 
and Segev. Our result also immediately implies that the LTDF construction 
based on the RSA function from [5] as well as the lattice-based construction 
from |i fS] can now be used in a black-box way to achieve CCA security. 

To achieve our result, we first prove a straightforward theorem bounding the 
amount of lossiness required of an LTDF in order to argue that its ro-wise product 
is one-way with respect to a correlated input distribution C w with min-entropy /r. 
We then show that if we instantiate the error-correcting code in the Rosen-Segev 
construction with Reed-Solomon codes and carefully choose the parameters, then 
we can use a correlated input distribution C w with enough min-entropy p, that 
we only need an LTDF that loses about two bits. Since it is easy to amplify the 
quantity of lossiness (not the rate), we can get an LTDF that loses two bits from 
any LTDF that loses only a noticeable fraction of a bit. 

Since we have significantly lowered the amount of lossiness needed for CCA 
security, we hope that it will be possible to achieve CCA security via LTDFs 
from a wider variety of assumptions. Towards this goal, we give an example of 
how to build a slightly lossy TDF using an assumption from which it is not 
clear how to build an LTDF with significantly more lossiness. Our LTDF is 
based on modular squaring and it loses a constant fraction of one bit under the 
assumption that it is hard to distinguish the product of two primes from the 
product of three primes [2J- Our results described above immediately give us 
CCA security from this assumption!!. Interestingly, Freeman, Goldreich, Kiltz, 
Rosen, and Segev [5] independently describe an LTDF that loses one bit under 
the quadratic residuosity assumption. Our result allows them to achieve CCA 
security from this slightly lossy TDF in a black-box way. 

A Closer Look. To see why slightly lossy TDFs are sufficient for building 
a variety of cryptographic primitives, let us first focus on building CPA-secure 


2 Rosen and Segev focused on the ui-repetition case in the proceedings version of their 
paper m ■ See their full version m for details on the more general case. 

3 It should be noted that this assumption is clearly stronger than other assumptions 
from which we already know how to achieve CCA security (e.g., factoring [8]). 
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encryption. For simplicity, say that we have a family T of LTDFs with domain 
{0,1}” that (computationally) loses 1 bit. Now consider a new family of LTDFs 
which is simply the w-wise product of T for w = poly (A), where A is the security 
parameter. This means that to sample a function from the product family we 
independently sample w functions from T\ the domain of the product family is 
{0, l} nw . It is easy to see that such a family computationally loses w = poly(A) 
bits and, applying the results of [15) . is thus one-way. Applying generic hardcore 
predicates, this immediately gives us a CPA-secure encryption scheme. 

The Rosen-Segev encryption scheme is similar, but one important difference 
is the input distribution to the function chosen from the product family is no 
longer uniform over {0, l}™ 1 ", but instead correlated (recall that it is what we 
call {d,w)~ subset reconstructible). This helps provide the ability to answer de¬ 
cryption queries in the proof. Rosen and Segev focused on the case d = 1, which 
means each of the w functions that make up the product function is applied to 
the same input. If such functions are not very lossy, too much information about 
the input will leak. We show that by choosing an appropriate error-correcting 
code in the RS construction and by carefully setting the parameters, we can 
instead set d large relative to w and thus get enough entropy in the input distri¬ 
bution to argue one-wayness and achieve CCA security when using only slightly 
lossy TDFs in the w-wise product function. 

Open Directions. An interesting open question is whether we can achieve 
CCA-security based on other hardness assumptions. For example, is it possible 
to construct slightly lossy trapdoor functions from hardness assumptions from 
which we don’t already know how to achieve CCA security? Another interesting 
question is whether LTDFs with small amount of lossiness are sufficient for 
constructing other primitives such as collision resistant hash functions. Lastly, 
another challenging direction is developing techniques for amplifying the lossiness 
rate, i.e., increase the lossiness to input-size ratio. 

2 Preliminaries 

Notation. Throughout the paper, A denotes a security parameter. For a ran¬ 
dom variable X, we let x <— s X denote choosing a value uniformly at random 
according to (the distribution of) X and assigning it to x. We say a function 
^(•) is negligible if /r(A) £ A - “W and is noticeable if /x(A) £ A -0 ^ 1 ). We let 
negl(A) denote an arbitrary negligible function, poly(A) a polynomially bounded 
function and po iy(x) denote an arbitrary noticeable function. 

Probability Background. Let X,Y be two (discrete) random variables dis¬ 
tributed over a countable set V according to T>x and Vy respectively. The sta¬ 
tistical distance between X and Y (or between T>x and Vy) is defined as 

A(X,F) = i^|Pr[X = r;]-Pr[y = U ]| 

vGV 

For two random variable ensembles X = {AA}a<=n and y = {LAjAeN indexed by 
a (security) parameter A, we say that X and y are statistically indistinguishable 
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(denoted X sa y) if A(X x , Y x ) = negl(A). Likewise, X and y are computationally 
indistinguishable (denoted X ss (V) if 

| Pr [ A(X x ) = 1 ] - Pr [ A(Y X ) = 1]| = negl(A) 

for any PPT algorithm A (where the probability is taken over the randomness 
of A and the random variables X x , Y x ). 

For a random variable X taking values in a domain X, we define its min- 
entropy as 

Hqo (X) = — log(max Pr [ X = x ]). 

where max l6 ^ Pr [ X = x ] = 2 —H<x> CO denotes the predictability of the random 
variable A'. 

Another useful notion of entropy is the average min-entropy (defined in |5|) 
of a random variable X (given Y) which is defined as follows: 

Hoo(A|y) = - log ( [2 ’ h “ (x 1 F=y) ]) 

The average min-entropy expresses the average maximum probability of pre¬ 
dicting X given Y. The following lemma gives a useful bound on the remaining 
entropy of a random variable X conditioned on the values of side information. 

Lemma 1 (0, Lemma 2.2b). Let A, Y, Z be random variables such that Y 
takes at most 2 k values. Then 

Hoc (A | (Y,Z)) > H 00 ((A, Y) \Z)-k> R 00 (X\Z) - k. 

In particular, if X is independent of Z then Hoc (A | (Y, Z)) > Hoo(X) — k. 

Trapdoor Functions. A collection of injective trapdoor functions is a tuple 
of PT algorithms T = (G, F, F _1 ) such that (probabilistic) algorithm G outputs 
a pair (s, t ) consisting of function index s and a corresponding trapdoor t. Deter¬ 
ministic algorithm F, on input a function index s and x £ {0,1}” outputs f s {x). 
Algorithm F _1 , given the trapdoor t, computes the inverse function / s _1 (-)- Con¬ 
sider a collection of injective trapdoor functions T with domain {0,and 
let X(1 A ) be a distribution over {0, l} n ( A ). We say T is one-way with respect to 
X if for all PPT adversaries A and every polynomial p(-) it follows that for all 
sufficiently large A 

Pr [ A(l\ s, F(s, ad) = F~\t , F(s, a:)) 1 < , 

P{ A) 

where (s,t) <— $ G(1 A ) and x <— *X(1 A ). 

We say that T is (n(A), £(A))-lossy if there exists a PPT algorithm G that, on 
input security parameter 1 A , outputs s and t such that 

— The first outputs of G and G are computationally indistinguishable. 

— For any (s,t) outputted by G, the map F(s, •) has image size at most 2 n ~ i . 
We call I the lossiness. 

We will sometimes call a TDF that is lossy a lossy trapdoor function (LTDF). 
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Public-Key Encryption. A public-key encryption scheme is a triple A£ = 
(KL,£, V) of PPT algorithms. The key generation algorithm 1C, on input the 
security parameter 1 A , outputs a pair of keys ( pk, sk). The encryption algorithm 
£ gets as input the public key pk and a message m £ Ai (for some message space 
At) and outputs a ciphertext c. The decryption algorithm T> on input the secret 
key sk and a ciphertext c, outputs a message m or 1 (failure). It is required 
that Pr \D(sk,£(pk,m)) ^ m] = negl(A), where the probability is taken over 
the randomness of 1C, £ and D. 

The strong notion of security for a public key cryptosystem A£ = (JC,£,T>) 
we consider in this paper is indistinguishability of ciphertexts under a chosen 
ciphertext attack (IND-CCA) |13ll7j . We define IND-CCA security as a game 
between and adversary A and an environment as follows. The environment runs 
/C(l") to get a keypair (pk, sk) and flips a bit b. It gives pk to A. A outputs a 
pair of messages mo, mi £ M. with |mo| = |too|. The environment returns the 
challenge ciphertext c<—$£(pk,mb) to A. Additionally, throughout the entire 
game the adversary also has access to a decryption oracle Dec that, on input c, 
outputs U(sk,c). The one restriction we place on the adversary is that it may 
not query the challenge ciphertext to the decryption oracle, as this would lead 
to a trivial win. At the end of the game the adversary A returns a guess bit b'. 
We define the IND-CCA advantage of an adversary A as 

Adv“^ 4 £ Ca (A) = 2 • Pr [A wins ] — 1 . 

We say that A£ is CCA-secure if Adv(^^ ca (A) is negligible in A for all PPT 
adversaries A. 

Error Correcting Codes. We use error correcting codes for the construction 
of the CCA secure scheme. 0 In this section we review some basic definitions and 
facts from coding theory. We focus only on the material that is required for the 
security proof of our CCA construction. The reader is referred to [TU] for a 
detailed treatment of the subject. 

Let £ be a set of symbols (alphabet) with | £\ = q. For two strings x, y £ £ w , 
the Hamming distance dff(x, y) is defined as the number of coordinates where 
x differs from y. Consider now an encoding map ECC : £ k —> £ w . A code C is 
simply the image of such a map (that is C C £ w ), with \C\ = q k . The minimum 
distance of a code C is defined as 

d(C) = min {(4(x,y)} 

x,yeC 

x^y 

We use [w,k,d] q to denote a code C with block length w (C C £ w ), message 
length k = log^ \C\, minimum distance d(C) = d and alphabet size \£\ = q. 

For the CCA construction we need a code whose words are as “far apart” 
as possible. In particular, for a fixed k, we need a code which maximizes d/w 
under the restriction that w is polynomial in k. By the Singleton bound [23], 


4 For the purposes of the construction, we only need an appropriate encoding scheme 
and not a full -fledged error correcting scheme, in the sense that the ability to decode 
is unnecessary for the construction. 
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d < w — k + 1 for any code and alphabet size which immediately gives an 
upper bound 1 — for d/w. Codes that meet the Singleton bound are called 
Maximum Distance Separable (MDS) codes. 

Reed-Solomon Codes. Reed-Solomon codes (introduced in [T8]) are an example 
of MDS codes. We describe a (simplified) construction of a family of asymptotic 
Reed-Solomon codes. Let RS k denote a Reed-Solomon code (or more precisely 
a family of RS codes) with message length k, block length w and alphabet size 
|if | = q (with q > w). The construction works as follows: 

• Generation: Pick a field F 9 (for convenience we use Z g as the underlying field 
where q is the smallest prime such that q > w). Pick also w distinct elements 
cci, a w £ Z q (evaluation points). 

• Encoding: Let m = (mo, mk-i) £ E k be a message and let m(x) = 
Y, :; nijXi be the corresponding polynomial. The encoding of the message 
is defined as 

ECC(m) = (m(ai),..., m(a w )) £ Z g 
where the evaluation takes place over Z 9 . 

Lemma 2. The Reed-Solomon code RS q w k has minimum distance d = w—k+1. 
Also both the code length and the time complexity of the encoding are polynomial 
in w. 


3 Products and Correlated Inputs 

In this section we define w-wise products, prove the lossiness amplification lemma 
that we use throughout the paper, and finally present the types of correlated 
input distributions we are interested in for our CCA result. 

3.1 Products and Lossiness Amplification 

We first define the w-wise product of a collection of functions 

Definition 1 (w-wise product, Definition 3.1 in [21]). Let T = ( G,F ) be 

a collection of efficiently computable functions. For any integer w, we define the 
w-wise product T w = ( G Wl F w ) as follows: 

— The generation algorithm G w on input P invokes G(P) for w times inde¬ 
pendently and outputs (s i,..., s w ). That is, a function is sampled from J- w 
by independently sampling w functions from T. 

— The evaluation algorithm F w on input (si,..., s w , x±, ..., x w ) invokes F 
to evaluate each function Si on Xi. That is, F w (si ,..., s w , x \,..., x w ) = 
(F(s 1 ,x 1 ),.. .,F(s w ,x w )). 


Chosen-Ciphertext Security from Slightly Lossy Trapdoor Functions 303 


We will use the following lemma throughout the rest of the paper. It states that 
wi-wise products (for w = poly(X)) amplify the absolute amount of lossinesifl. 

Lemma 3 (Lossiness Amplification). Let X be a security parameter. For any 
family of TDFs T = (G, F, F~ 1 ) with message space n{ A), if T is (n( A),£(A))- 
lossy, then the w(-)-wise product family T w (defined above) built from T is (n(A)- 
w(X), fyA) • w(X))-lossy for all w = poly(X). 

Proof. First, if there exists an efficient lossy key generation algorithm G that out¬ 
puts indistinguishable function indices from G, then by a standard hybrid argu¬ 
ment it follows that G w , which runs G independently w times to get (si, fy),..., 
( s w , t w ) and outputs (s, t) where s = (si,..., s w ) and t = (fy, ..., t w ), outputs 
indistinguishable keys from G w . 

Second, since for each Si outputted by G the map F(si , •) has range size at 
most 2 n ~ e , it follows that for each s outputted by G w , map F w ( s, •) has range 
size at most (2 n ~ e ) w = 2 nw ~ ew . □ 

An immediate implication of Lemma[3]is that (n, po iy^x) )~LTDFs imply injective 
trapdoor one-way functions and CPA-secure encryptions (the proofs of these 
statements are rather straightforward and hence omitted). We simply state this 
observation as a corollary for completeness. 

Corollary 1. Let p(-) be a polynomial. Then ( n , -^^)-LTDFs imply injective 
trapdoor one-way functions and CPA-secure encryption schemes. 


3.2 Subset Reconstructible Distributions 

While it is well-known that if T is one-way with respect to the uniform distri¬ 
bution on {0,1}", then the product T w is one-way with respect to the uniform 
distribution over {0,1}"“', we will be interested in the security of products when 
the inputs are correlated and not necessarily uniform. We will be interested in 
input distributions that are what we call (d, ufysubset reconstructible. 

Definition 2 (( d,w )- Subset Reconstructible Distribution (SRD)). Let 

d,w £ N such that d < w, S be a domain and V a distribution with support 
Supp(D) C S w . We say that V is ( d,w)~ Subset Reconstructible (and denote 
SIZ'Dd, w ) if, each w-tuple (x ±,..., x w ) £ SupplfD) is fully and uniquely recon¬ 
structible from any subset {x ^,..., Xi d } of d distinct elements of the tuple. 

It is easy to see that the special case where d = 1 and S = {0,1}" gives the 
uniform re-repetition distribution used in the simplified construction of the CCA 
secure cryptosystems in [2T]. For our CCA-construction, we need to choose a 
value for d smaller than w (this is necessary for almost perfect simulation of 

5 We use the term “absolute amount of lossiness” to explicitly distinguish it from “rate 
of lossiness” defined as ^ for a (n, AfyLTDF. Amplifying the rate of lossiness seems 
to be a much harder problem than amplifying the absolute amount of lossiness. 
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the decryption oracle) but as close to w as possible in order to minimize the 
required lossiness of the TDF (the closer to 1 the value is, the less lossiness 
we need for the CCA construction). We note that the SRD notion is similar to 
other well-known notions in coding theory and cryptography; we compare and 
contrast in EH- 

Sampling via Polynomial Interpolation. We use polynomial interpolation 
as a way to sample efficiently from SIZ'Dd,w for any value of d and w. The 
construction is identical to the one used by Shamir [22] for a (d, u>)-threshold 
secret sharing scheme. On input a prime Q (with logQ = 0(poly(A))) and 
integers d,w, the sampling algorithm picks independently d values po,---,Pd-i 
uniformly at random from TLq (these correspond to the d coefficients of a (d— 1)- 
degree polynomial p £ Zq[x]). The algorithm then simply outputs (xi, ...,x w ) = 
(p(l), ...,p(w)) where evaluation takes place in Zq and Xj’s are represented by 
binary strings of length at most logQ. The following lemma (proved in [ITT] ’) 
states that the output distribution of polynomial interpolation sampling is a 
(d, w)- subset reconstructible distribution with sufficient entropy. 

Lemma 4. Let w = poly (A). Then the above algorithm is a poly(A )-sampling 
algorithm for SLZDd, w - Also the min-entropy of the distribution S7ZDd, w d ■ 
log Q. 

4 CCA Security from Functions with Small Lossiness 

In this section we prove our main result: lossy TDFs that lose a noticeable frac¬ 
tion of a bit imply CCA-secure encryption. We start by describing the encryption 
scheme of Rosen and Segev [21] that shows that CCA security is implied by the 
security (one-wayness) of trapdoor injective functions under certain correlated 
products. We then show that (n, 2)-lossy TDFs imply injective trapdoor func¬ 
tions that are secure under these correlated products. We complete the proof 
by observing that (n, 2)-lossy TDFs can be constructed in a black-box way from 
LTDFs that lose a po ^x) f rac ti° n of a single bit (this is clear by a straightforward 
lossiness amplification argument). 

For ease of presentation, we describe a single-bit encryption scheme. Due to 
a recent result [12], this directly implies the existence of multi-bit CCA-secure 
schemes. We mention however that one can get a multi-bit encryption scheme 
directly by simply replacing the hardcore predicate h with a universal hash 
function, as in the PKE schemes of [T5] . 

4.1 The Rosen-Segev Construction 

We recall the cryptosystem from j2Tj. Let T = (G, F, F -1 ) be a collection of 
injective trapdoor functions, C w be an input distribution such that any x = 
(xi,... ,x w ) outputted by C w ( 1 A ) can be reconstructed given any size d < w 


Any (fixed and public) distinct values ai ,..., a w € Z 9 instead of 1,..., w would work 
just fine. 
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subset of x. Let also h : {0,1}* —> {0,1} be a predicate, ECC : E k —> E w 
be the PT encoding function for an error-correcting code with distance d and 
FI = (Kg, Sign, Ver) be a one-time signature scheme whose verification keys are 
elements from E k . The RS encryption scheme works as follows: 

Key Generation: On input security parameter 1 A , for each a £ E and each 
1 < * < w, run (sf,ff) <—$G(1 A ), the key generation for the injective trapdoor 
function family. Return the pair (pk, sk) where 

p k = (KWviKW) 
sk= (KlaeJC, • • •, {O^e-c) 

Encryption: On input public key pk and one-bit message to, run Kg(l A ) 
to generate ( VK,SK ) and sample (xi,..., x w ) from C w ( 1 A ). Apply the error 
correcting code to VK to get ECC(VK) = (ay,... ,a w ). Then output 

c= (VK,y 1 ,...,y w ,ci,c 2 ) , 

where VK is as above and 

yi = F(sf',Xi), 1 <i<w 

Ci = to ® h .(s^ 1 ,..., , Xi,..., x w ) 

c 2 = Sign (SK, (yi,..., y w , ci)) . 

Decryption: On input secret key sk and ciphertext ( VK, y \,..., y w , ci, c 2 ) 
check if Ver( VK, (j/i,..., y w , ci), c 2 ) equals 1. If not output T. Otherwise, com¬ 
pute ECC(VK) = (cti, ..., a w ) and pick d distinct indices ii ,..., id- Use the trap¬ 
doors td 1 ,..., td d to compute 

Xii = F~\t^ 1 ,y il ),..., x id = F- 1 ^ ,y id ) . 

Use these Xj’s to reconstruct the entire vector xi,..., x w . If yj = F(s a d , Xj) for 
all 1 < j < w output Ci © h (s^ 1 ,..., 3 %°, Xi,..., x w ) and otherwise output T. 
Rosen and Segev proved the following theorem: 

Theorem 1 (Theorem 5.1 in |'19j ). If II is a one-time strongly unforgeable 
signature scheme, F is secure under a C w -correlated product, and h is a hardcore 
predicate for T w with respect to C w , then the above PKE scheme is IND-CCA 
secure. 

4.2 Our Result 

In this section we establish the following result 

Theorem 2 (Main Theorem). CCA-secure schemes can be constructed in a 
black-box way from LTDFs that lose pol y(^ bits. 

The proof proceeds in two steps. In the first step (Lemma[5|), we show that lossy 
TDFs give rise to families of injective trapdoor functions that are secure under 
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correlated product distributions with sufficiently large entropy. Moreover, the 
more entropy the underlying distribution has, the less lossiness is needed from 
our LTDFs. In the second and final step (Lemma|6|), we show that, by choosing an 
appropriate error correcting code and a correlated input distribution with high 
entropy in the Rosen-Segev scheme, we can achieve one-wayness under correlated 
products (and hence CCA-security) starting from lossy TDFs with minimal lossi¬ 
ness requirements. More specifically, using the uniform SIZ'Dd,w (which has high 
entropy, see Lemma 0]) as our underlying distribution and Reed-Solomon codes 
for ECC, we show that (n, 2)-lossy TDFs suffice for CCA-secure encryption. We 
then derive Theorem 0] by observing that (n, 2)-lossy TDFs can be constructed 
by (r/, po iy(A) H 0SS y functions (where n = poly(n')) (see Lemma 0]). 

Lemma 5. Let T = (G, F, F^ 1 ) be a collection of (n, i)-lossy trapdoor functions 
and let T w = ( G W ,F W ) be its w-wise product for w = poly(A). Let C w be an 
input distribution with min-entropy fx. Then T w is secure under a C w -correlated 
product as long as 

w w 

Proof. The proof is similar with a proof from [15] . Assume for the contrary that 
there exists an inverter T that succeeds at inverting T w with probability 1 /p{ A) 
for some polynomial p. We will build an adversary A that can distinguish between 
the lossy keys and real keys. Because of a standard hybrid argument, it suffices 
to show that there exists an adversary A that can distinguish with non-negligible 
probability the case where it is given w = poly(A) lossy keys (generated with 
G) from the case where it is given w = poly (A) real keys (generated with G). 
Adversary A, on input keys s = (si,..., s w ), samples x = (aq,..., x w ) from 
C w ( 1 A ) and runs the inverter 1(1*, s, F w (s, x)). If the s are real keys generated 
from G, then X will output x with probability If, however, s come from G, 

then the probability of success for X is at most 2 -H °°( x I OFm( s > x ))\ 

To bound this probability, we use Lemma 0] to see that 

Hoo(x | (s,i 7 i„(s,x))) > H oc ,(x |s) — w(n — £) . (1) 

Since the choice of the functions is independent from the choices of x, the first 
term on the right of the above equation is simply Hoo(x) and thus /x. Combining 
with HDj we get that 

Hoo(x | (s, F„,(s,x))) > p — w(n — £) > w(log A) 
where in the last inequality we used the bound for £. It follows that the prob¬ 
ability X succeeds in the case when A is given lossy keys is upper bounded by 
2 ~“(log A) _ n egl(A). Therefore, for that choice of £ the inverter has negligible 
success probability and thus A can distinguish between keys from G and keys 
from G which gives us our contradiction. □ 

Lemma 6. CCA-secure schemes can be constructed in a black-box way from 
(n, 2 )-lossy TDFs. 
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Proof. Let n = poly (A). Let also ECC € k be a Reed-Solomon code with 
k = rf (for some constant e with 0 < e < 1) , w = n c for some constant c > 1 + e, 
q the smallest prime such that q > w and distance d = w — k + 1. Let also C w be 
the distribution SIZ'Dd,w sampled via polynomial interpolation (see Section 13.21) 
for some prime Q such that n — 1 < logQ < n. Let finally T = (G, F, F _1 ) 
be a collection of (n, 2)-lossy trapdoor functions and T w = ( G W ,F W ) be its 
in- wise product. By construction (Lemma S] Section 13.21) C w has min-entropy 
/i = H 00 (G UJ ) = d ■ log Q and can be sampled in time poly(u;) = poly (A). In 
addition, by properties of the Reed-Solomon codes we have 
d w ^ k + 1 > k 1 

w w ~ w n c ~ e 

and hence 


u, d , _ , 

— = — log Q > (n 
w w 

Therefore, we have that 

H w(logA) 

n -1- 

w w 



= n — 1 — 



+ 


1 


n c e 



1 _1_ t^(logA) 

n c ~ e ~ 1 n c ~ e n c 


^(logA) 

w 

< 2 


for some w(logA)- function. Applying Lemma |5j we get that T is secure under 
the aforementioned G^-correlated product. Let h be a hardcore predicate for the 
ui-wise product T w (with respect to C w ). Applying the Rosen-Segev construction 
along with Theorem [T] from Section Bdl we conclude that (n, 2)-lossy TDFs imply 
CCA-security (in a black-box sense). □ 


5 An Explicit Construction of a Slightly Lossy TDF 

The Idea. In this section we construct an LTDF that loses 1/4 bits. At a 
high level, our construction works as follows: the basic component is a trapdoor 
function g (with trapdoor t) that statistically loses £ bits (l > 0 and t = 0 
corresponds to an injective trapdoor function). Let also g be a deterministic 
function such that g ss g (under some computational assumption CA ) and g loses 
£ bits (that is \Img(Dom(g))\ < ) for some £ > l. Consider now a function 

h such that ||/i(a;)|| = £ (where || • || denotes bitsize) and (g(x),h(x)) uniquely 
determines the preimage x (which can be efficiently recovered given the trapdoor 
t) for all inputs x. The descriptions of the injective trapdoor function and the 
lossy function are s = ( g , h) and s = (<?, h) respectively. It is not hard to see that 
s corresponds to an (i—£)-lossy function. Indeed |Img(s)| < \Img(Dom(g))\-2 e < 
■ Finally the indistinguishability of g and g implies that s ~ s. 

Below we give an example on how to instantiate our technique using as a core 
trapdoor function the squaring over a composite modulus N. We believe that 
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our technique might serve as a paradigm for the construction of LTDFs from 
other hardness assumptions in the future. 

Hardness assumption. Consider the following two distributions (where n = 
poly (A)). 

2Primes n = {N = pq | \\N\\ = n; p,q distinct primes; p = q = 3 (mod 4)} 
3Primes n = {N = pqr | ||iV|| = n; p,q,r distinct primes;pqr = 1 (mod 4)} 

where ||7V|| denotes the bitsize of N and ||iV|| = n implies that the most signifi¬ 
cant bit of N is 1. 

Assumption 1 (2v3Primes). For any PPT algorithmV and any polynomial p(-) 

I Pr [ V(2Primes n ) = 1 ] — Pr [ V(3Primes n ) = 1 ] I < — r—r 

p[n) 

where the probability is taken over the randomness of sampling N and the inter¬ 
nal randomness of T>. 

This assumption (in a slightly different form) was introduced in [5] under the 
name 20R3A. 

The Construction. For our function g we use squaring modulo the product 
N of two large (balanced) primes p and q. This function was the basis for the 
Rabin cryptosystem Il6l . We define a family of iniective trapdoor functions T = 
(G, F) .F” 1 ) as follows: 

G(1 a ): N <— pq, with p = q = 3 (mod 4) and pq has bitsize n + 1. That is, 
N <— s 2Primes n +i. Return (s,t) where s = N and t = ( p,q ). 

G(1 a ): N <— pqr with pqr = 1 (mod 40 and pqr has bitsize n + 1. That is, 
N <— s 3Primes n - |_i. Return (s, _L) where s = N. 

F(s,x): Parse s as N. On input x £ {0,1}" compute y = x 2 mod N. Define 
Fn{x) = 1 if x > N/2 and Vn{x) = 0 otherwise and Qyr(a;) = 1 if Jn{x) = 1 
and Qn{x) = 0 otherwise where Jn{x) is the Jacobi symbol of x modulo N. 
Return (y,VN(x), Q N (x)). 

F _1 (f, y')\ Parse t as (p, q) and y' as {y,b\,bf). Compute the square roots 
Xi, ...,Xk of y using p and q. Compute also Vn(xi) and Qn{xi) for all i £ [k\ 
and output the (unique) Xi such that VN^Xi) = bi and Qn{x{) = f> 2 - 

Note that even though the modulus N has bitsize n + 1 (that is N > 2") the 
domain of the functions is {0,1}". 

Theorem 3. T as given above is a family of (n, j)-lossy trapdoor functions 
under the 2v3Primes assumption. 

' The requirement pqr = 1 (mod 4) is essential since otherwise there exists a trivial 
algorithm that distinguishes between Ns sampled according to G and those sampled 
according to G. 
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Proof. We prove the properties one by one: 

Injectivity/Trapdoor: First, F(s,x) is efficiently computable (Jn(x) can be effi¬ 
ciently computed even if the factorization of N is unknown). Let now s = N (N 
being a Blum integer) and y' = F(s, x) = (y, b\, b 2 ). 

If y G 7T n then it has 4 square roots modulo N which can be recovered using 
the trapdoor (p, q ) (by first recovering the pairs of square roots modulo p and 
q separately and then combining them using the Chinese Remainder Theorem). 
Let ±x, ±2 be the 4 square roots of y. Since Vn{x) = —Vn(—x) Vcc, only one of 
x, — x and one of 2, —2 is consistent with b±. Assume wlog that x, 2 are consistent 
with b\. Also since x / ± 2 , J'y (2) = —Jn{x) El and hence only one of 2,2 is 
consistent with b 2 . 

If gcd(y, iV) > 1 (wlog gcd (y,N) = p), then y has exactly 2 square roots 
(preimages) x and —x (which can be recovered using the CRT) out of which, 
only one is consistent with b\. 

This means that for all (n + l)-bit Blum Integers N and all x G {0,1}" 
the triple (x 2 mod N, Vn(x), Qn{x)) uniquely determines x, which, given (p, q), 
can be efficiently recovered. This concludes that T (defined over {0,1}™) is a 
collection of injective trapdoor functions. 

Lossiness: Let (s = N, _L) <— G(1 A ). Consider the sets 


51 

5 2 

53 


x G {0, l} r 


X G 


and x < 


N 


1 N) 

x G {0,l} n | gcd (x,N) > 1 and x < — > 

, N 
x G {0, l} n | x > — 


which form a partition of {0,1}™. Squaring modulo N = pqr is an 8-to-l function 
over h* N which means that y takes at most values. Also for all 1 6 Si, 

VNix) = 0 by definition. Hence {x 2 mod N,Vn{x), Qn{x)) for x € Si takes at 
most • 2 values, that is 




W) 

4 


( 2 ) 


Also, 1521 = N 2 ^ (th ere are N — <p(N) elements that are not coprime with 
N and exactly half of them are smaller than N/2). Finally, IS31 < 2 n — A-. We 
then have that 


N - d>(N) N 

\Img(S 2 )\ < |5 2 | < -and \hng(S 3 )\ < |5 3 | < 2 n - (3) 


It is easy to prove that if A is a Blum integer and x, z £ h* N such that x / ±2 and 
x 2 = z 2 = y (mod N), then Jn{x) = 
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Combining equations ([21) and © we get 


3 


|Img({0,in|<^|J mg (5 i )|< 

i= 1 


4 


m 

4 



N-HN) , 

2 + 

4 i 
—2 < 2 2"i 

5 “ 


TV 

y 


where in the penultimate inequality we used the fact that (for balanced primes 
p,q,r) (j>(N) = N — O(Ni) and hence > Y > Therefore the image of 
{0, l} n when TV is a product of 3 primes is at most A- which implies that in this 
case F(s, •) loses (at least) j-bits. 

Indistinguishability: The fact that s ~ s (where (s, •) <— G(1 A ) and (s, •) <— 
G(1 A )) follows directly from the 2v3Primes assumption. □ 
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Abstract. We revisit the problem of constructing efficient secure two- 
party protocols for set-intersection and set-union, focusing on the model 
of malicious parties. Our main results are constant-round protocols that 
exhibit linear communication and a linear number of exponentiations 
with simulation based security. In the heart of these constructions is a 
technique based on a combination of a perfectly hiding commitment and 
an oblivious pseudorandom function evaluation protocol. Our protocols 
readily transform into protocols that are UC-secure. 

Keywords: Secure two-party computation, Simulation based security, 
Set intersection, Set union, Oblivious pseudorandom function evaluation. 


1 Introduction 

Secure function evaluation (SFE) allows two distrusting parties to jointly com¬ 
pute a function of their respective inputs as if the computation is executed in 
an ideal setting where the parties send inputs to a trusted party that performs 
the computation and returns its result. Starting with the work of |:iVl20lfil4l . 
it is by now well known that (in various settings, and considering semi-honest 
and malicious adversaries) any polynomial-time computation can be generically 
compiled into a secure function evaluation protocol with polynomial complexity. 
However, more often than not, the resulting protocols are inefficient for practical 
uses and hence attention was given to constructing efficient protocols for specific 
functions. This approach that proved quite successful for the semi-honest setting 
(see, e.g., ISfill .412811 II Vl2,^l:-l()l24l2Vl V while the malicious setting remained, at 
large, elusive (a notable exception is ID). 

We focus on the secure computation of basic set operations (intersection and 
union) where the parties Pi,P2, holding input sets X, Y, respectively, wish to 
compute X fl Y or X U Y. These problems have been widely looked at by re¬ 
searchers in the last few years, and our main goal is to come up with protocols 
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for set-intersection and set-union that are secure in the malicious setting and 
are of better complexity to those known. 

We begin by briefly surveying the current constructions of two-party secure 
computation for set intersection and union that are most relevant to our work: 

— Freedman, Nissim and Pinkas studied set intersection in E2- They represent 
a set by a polynomial that zeros exactly on the set elements. Their construc¬ 
tion for the semi-honest setting utilizes oblivious polynomial evaluation and 
a balanced allocation scheme and exhibits linear communication (counting 
field elements) and (almost) linear computation (counting modular exponen¬ 
tiations). See Section [3] 

They also present variants of the above protocol, for the cases where one of 
the parties is malicious and the other is semi-honest. For efficiency, generic 
zero-knowledge proofs of adherence to the protocol are avoided. The protocol 
for malicious Pi (denoted client in [IT]) and semi-honest P -2 ( server ) utilizes a 
cut-and-choose strategy and hence communication is inflated by a statistical 
security parameter. The protocol for malicious P -2 and semi-honest Pi is in 
the random oracle model. A protocol that is secure in the fully malicious 
setup, that combines both techniques, is sketched in Section 13.11 

— Kissner and Song [25] used polynomials to represent multi-sets. Letting the 
roots of Qx(') and Qy(') coincide with elements of the multi-sets X and Y, 
they observed that if r(-), s(-) are polynomials chosen at random then the 
roots of r(-) • Qx(-) + s(-) • Qy(-) coincide with high probability with the 
multi-set XnY. This beautiful observation yields a set-intersection protocol 
for the semi-honest case, where the parties use an additively homomorphic 
encryption scheme (the Paillier scheme is suggested in [25]) to perform the 
polynomial multiplication, introducing quadratic computation costs in the 
set sizes. For the security of the protocol, it is crucial that no party should be 
able to decrypt on her own. Hence, the secret key should be shared and joint 
decryption should be deployed. Assuming a trusted setup for the encryption 
scheme, the communication costs for the two-party case are as in the protocol 
for semi-honest parties of im 

For malicious parties [25 s introduced generic zero-knowledge proofs for 
proving adherence to the prescribed protocol (e.g., zero-knowledge proofs 
of knowledge for the multiplication of the encrypted Q x {•) with a randomly 
selected r(-)). While this change seems to be of dire consequences to the pro¬ 
tocol efficiency, the analysis in [25] ignores its effects. Furthermore, the costs 
of setting the shared key for the Paillier scheme are ignored in the analysis. 
To the best of our knowledge, there are currently no efficient techniques for 
generating the shared Paillier keys, which do not incorporate an external 
trusted dealer (the latter schemes include [14115] referenced in [25]). 

In addition to that, Kissner and Song presented a protocol for the threshold 
set-union problem, where only the elements that appear in the combined 
inputs more than t times are learnt by the parties. Their protocol employs the 
same technique of polynomial multiplication and thus introduces quadratic 
computation costs as above. 
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— Hazay and Lindell [2Tj revisited secure set intersection, with the aim of 
achieving efficient protocols in presence of a more realistic adversarial be¬ 
havior than in the benign semi-honest model, and under standard crypto¬ 
graphic assumptions. Two protocols were presented, one achieves security 
in the presence of malicious adversaries with one-sided simulatability, the 
other is secure in the presence of covert adversaries J2]. The main tool used 
in these protocols is a secure implementation of oblivious pseudorandom 
function evaluation. 

Having Pi, P2 hold sets of sizes mx^rny respectively, both protocols in [ST 
are constant round, and incur the communication of 0{mx -p(n)-| -my) group 
elements and the computation of 0(mx • p(n) + my) modular exponentia¬ 
tions, where set elements are taken from {0, 

We note that the protocols in [2T] can be made secure in the malicious 
setup, e.g., by introducing a secure key selection step for the oblivious PRF 
and by adding zero-knowledge proofs of knowledge to show correctness at 
each step. Namely, for proving that the same PRF key is indeed used by party 
Pi in each iteration and to enable extraction of its input (as a pseudoran¬ 
dom function is not necessarily invertible). While this would preserve the 
complexity of these protocols asymptotically (in mx,my), the introduction 
of such proofs would probably make them inefficient for practical use since 
there is no efficient known way to construct them. 

— Recently, Jarecki, and Liu [23] presented a very efficient protocol for comput¬ 
ing a pseudorandom function with a committed key (informally, this means 
that the same key is used in all invocations), and showed that it yields an 
efficient set-intersection protocol. The main restriction of this construction 
is that the input domain size of the pseudorandom function should be poly¬ 
nomial in the security parameter (curiously, the proof of security for the 
set-intersection protocol makes use of the ability to exhaustively search over 
the input domain, so removing the restriction on the input domain of the 
pseudorandom function does not immediately yield a set-intersection proto¬ 
col for a super-polynomial domain). 

— Finally, Dachman-Soled et al. d! present a protocol for set-intersection in 
the presence of malicious adversaries without restricting the domain. Their 
construction uses polynomial evaluation but takes a different approach than 
ours by incorporating a secret sharing of the inputs to the polynomials. 
They avoid generic zero-knowledge by utilizing the fact that Shamir’s secret 
sharing implies Reed Solomon code. Their protocol incurs communication 
of 0(mk 2 log 2 n + kn ) group elements and computation of 0(mnk log n + 
mk 2 log 2 n). 


1.1 Our Contributions 

Our main contributions are efficient set-intersection and set-union protocols that 
are secure in the setup of malicious parties. Our constructions are in the standard 
model, and are based on standard cryptographic assumptions (in particular, no 
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random oracle or a trusted setup). We begin by briefly describing the construc¬ 
tion of Freedman et al. m for semi-honest parties that serves as our starting 
point. 

Secure Set Intersection with Semi-Honest Parties. The main tool used in the 
construction of m is oblivious polynomial evaluation. The basic protocol works 
as follows: 

1. Party Pi chooses encryption/decryption keys ( pk,sk ) <— G(l n ) for a homo¬ 
morphic encryption scheme (G, E, D) and sends pk to Pi. 

2. Pi computes the coefficients of a polynomial Q(-) of degree mx , with roots 
set to the mx elements of X , and sends the encrypted coefficients to Pi. 

3. For each element y £ Y (in random order), party Pi chooses a random value 
r (taken from an appropriate set depending on the encryption scheme), and 
uses the homomorphic properties of the encryption scheme to compute an 
encryption of r • Q{y) + y. Pi sends the encrypted values to P\. 

4. Upon receiving these encrypted values, Pi extracts XdY by decrypting each 
value and then checking if the result is in X. Note that if y £ X fl Y then by 
the construction of the polynomial Q(-) we get that r-Q{y) + y = r-O + y = y. 
Otherwise, r ■ Q{y) + y is a random value that reveals no information about 
y and (with high probability) is not in X. 

Note that the communication complexity of this simple scheme is linear in mx + 
my, as mx +1 encrypted values are sent from P\ to Pi (these are the encrypted 
coefficients of Q(-)), and my encrypted values are sent from P 2 to Pi (i.e., Q(y) 
for every y £ Y). However, the work performed by P 2 is high, as each of the my 
oblivious polynomial evaluations includes performing 0{mx ) exponentiations, 
totaling in 0(mx • my) exponentiations. 

To save on computational work, Freedman et al. introduced a balanced al¬ 
location scheme into the protocol. Loosely speaking, they used the balanced 
allocation scheme of 0 with B = log ^* mx bins, each of size M = 0(m x /B + 
log log B) = O(loglogm.Y). Party Pi now uses the balanced allocation scheme 
to hash every x £ X into one of the B bins resulting (with high probability) 
with each bin’s load being at most M. Instead of a single polynomial of degree 
mx party Pi now constructs a degr ee-M polynomial for each of the B bins, 
i.e., polynomials Q i(-),... ,Qb{-) such that the roots of Qi(-) are the elements 
put in bin i. As some of the bins contain less than M elements, P\ pads each 
polynomial with zero coefficients up to degree M. Upon receiving the encrypted 
polynomials, party P 2 obliviously evaluates the encryption of ro ■ Qh 0 (y)(y ) + V 
and r\ ■ Qh 1 (y){y ) + V for each of the two bins ho(y ), h\(jj) in which y can be 
allocated, enabling Pi to extract X fl Y as above. 

Neglecting constant factors, the communication complexity is not affected 
as Pi now sends BM = 0(nix) encrypted values and P 2 replies with 2my 
encrypted values. There is, however, a dramatic reduction in the work performed 
by P 2 as each of the oblivious polynomial evaluations amounts now to performing 
just O(M) exponentiations, and P 2 performs 0(my ■ M) = 0(my ■ log log mx) 
exponentiations overall. 
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Our main goal is to come up with protocols that exhibit low asymptotic com¬ 
munication and computation costs in the presence of malicious behavior. Noting 
that asymptotic complexity does not reveal everything about a protocol’s effi¬ 
ciency or practicality, we avoid using generic zero-knowledge proofs of adherence 
to the prescribed protocols, even when they involve relatively short statements, 
and costly set up commutations that make the efficient only for very large inputs. 
Our contributions are realized as follows, 

Preventing the Players from Deviating from the Protocol: We inherit the obliv¬ 
ious polynomial evaluation and balanced allocation techniques used in [T7] . On 
top of these we introduce an efficient zero-knowledge proof that Pi uses to show 
that her encrypted polynomials were correctly produced (unlike in l!7| , our proof 
does not use a cut-and-choose strategy), and a technique preventing player P 2 
from deviating meaningfully from the protocol. This technique combines a per¬ 
fectly hiding commitment scheme with an oblivious pseudorandom function eval¬ 
uation protocol. 

Eliminating the Random Oracle: In some sense, our construction replaces the 
random oracle used in m in the case of a malicious sender with a PRF, but this 
‘replacement’ is only in a very weak sense: In our construction P 2 holds the key 
for the pseudorandom function, and hence the function does not look random to 
P 2 , nor does P 2 does not need to invoke the oblivious pseudorandom evaluation 
protocol to compute it. The consequence is that, unlike with the simulator for 
the protocol in the random oracle model that can easily monitor all invocations 
of the oracle, our simulator cannot extract P 2 ’s input to the pseudorandom 
function. 

We note that the protocols of [21] also use an oblivious pseudorandom function 
evaluation primitive, where the player analogous to P 2 knows the key for the 
function. Their usage of this primitive is, however, very unlike in our protocols. 
In the protocols of [2T] the pseudorandom function is evaluated on the set of 
elements that P 2 holds, using the same PRF key for all evaluations. Whereas 
in our protocols it is evaluated on a random payload using (possibly) different 
keys. A payload s y is a random element that is chosen independently for each 
element y £ Y with the aim to fix the randomness used by P 2 . Meaning that 
the randomness for the oblivious polynomial evaluation of y is determined by 
the PRF evaluation of s y . Furthermore, the protocols in [2T] are designed for the 
covert adversary model and for the one-sided simulatability model, and hence a 
technique enabling full simulation of P 2 is not needed, whereas our constructions 
allow simulation of both parties. 

Choosing the Underlying Encryption Scheme: Our protocols make extensive use 
of a homomorphic encryption scheme, and would remain secure (with only small 
modifications) under a variety of choices. We chose to work with the El Gamal 
scheme (that is multiplicatively homomorphic) although it may seem that the 
more natural choice is the Paillier scheme [52] . that is additively homomorphic 
(indeed, our initial constructions were based on the Paillier scheme). 
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Using the Paillier scheme, a subtle problem emerges (this was overlooked, e.g., 
in P33 ). Recall that for the Paillier scheme pk = N, sk = </>(N). Now, if Pi knows 
sk when she constructs her polynomials, then she may construct a polynomial 
Q(-) such that Q(y) qL Z* N for some specific ‘target’ value y. This would allow 
her to learn about iV s input beyond the intended protocol output. A possible 
solution is that Pi, P 2 would first engage in a protocol to jointly generate pk and 
shares of sk, whereas Pi would learn sk only after committing to her polynomials. 
This, however, introduces high key setup costs, and the result is a protocol that 
exhibits low asymptotic costs, but, because of its high setup costs, its efficiency 
is gained only for very large inputs. 

Efficiency: Our protocols for set intersection and set union 7r n , 7Tu are constant 
round, work in the standard model and do not require a trusted setup. The 
underlying encryption scheme is El Gamal where the keys are selected by party 
Pi. Both protocols do not employ any generic zero-knowledge proof. 

Assuming the protocols of |16I21] (that require p[n) oblivious transfers for 
realizing the oblivious pseudorandom function evaluation), we get that for sets 
X,Y C {0, 1 }U") of mx,rnY elements respectively, the costs of are of 

sending 0{mx + ffly • p(n)) group elements, and the computation of 0(mx + 
my-(fog\ogmx+p{n))) modular exponentiations. Note that this is significantly 
better than 0(mx ■ my). 

A significant improvement can be achieved by using a more efficient pseu¬ 
dorandom function evaluation instead of using the function of |29| which re¬ 
quires a single oblivious transfer for every input bit. This is due to the fact 
that our protocol uses oblivious pseudorandom function evaluation as a black 
box. Furthermore, for set intersection, another significant improvement can be 
achieved if the size of the intersection nixnY is allowed to be leaked (to P 2 ). 
The resulting protocol is of sending 0(mx + mxnY ■ p{ji)) and computing 
0(mx + rny ■ loglogmx + mxnY • p(n)). When mxnY •C my we get a pro¬ 
tocol that is more efficient than that of [2T]. This type of improvement does not 
apply for (2] as well since the parties apply the prf directly on the input set Y 
and thus cannot deduce mxnY before that. 

UC Security: Our protocols readily transform into the UC framework as all our 
simulators are straight-line in an hybrid model with access to some specific zero- 
knowledge proofs. We show how to modify our set intersection protocol to one 
that is secure in the UC framework (in the common reference string model). 

For lack of space, we focus only on the protocol for secure set intersection in 
the malicious model, and omit the more standard details of the construction, 
and its proof of security. The missing details and proofs can be found in the full 
version of this paper m- 

2 Preliminaries 

Throughout the paper, we denote the security parameter by n, and, although 
not explicitly specified, input lengths are always assumed to be bounded by some 
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polynomial in n. A probabilistic machine is said to run in polynomial-time (ppt) 
if it runs in time that is polynomial in the security parameter n alone. A function 
/i(n) is negligible (in n) if for every polynomial p(-) there exists a value N such 
that p(n) < for all n > N; i.e., p(n) = n 


2.1 Secure Two-Party Computation — Definitions 

We use standard definitions of security for two party commputation in the mali¬ 
cious model, which we now briefly review. The reader is referred to m Chapter 
7] for more details and motivating discussion. 

We prove the security of our protocols in the setting of malicious adversaries, 
that may arbitrarily deviate from the specified protocol. Security is analyzed by 
comparing what an adversary can do in a real protocol execution to what it can 
do in an ideal scenario. In the ideal scenario, the computation involves an incor¬ 
ruptible trusted third party to whom the parties send their inputs. The trusted 
party computes the functionality on the inputs and returns to each party its 
respective output. Informally, the protocol is secure if any adversary interacting 
in the real protocol (i.e., where no trusted third party exists) can do no more 
harm than what it could do in the ideal scenario. We consider the static setting 
where the adversary is only able to corrupt a party at the outset of the protocol. 
There are technical issues that arise, such as that it may be impossible to achieve 
fairness or guaranteed output delivery. E.g., it is possible for the an adversarial 
party to prevent an honest party from receiving outputs. 

2.2 The El Gamal Encryption Scheme 

The El Gamal encryption scheme operates on a cyclic group G of prime order q. 
We will work in the group Z*, where q' = 2q + 1 is prime, and set G to be the 
subgroup of Z g / of quadratic residues modulo q' (note that membership in G can 
be easily checked). Let g denote a random generator in G, then the public and 
secret keys are (G, q, g, h) and (G, q, g, x) where x <—r Z q and h = g x . A message 
to £ G is encrypted by choosing y <— r h q and the ciphertext is ( g v , h v • m). A 
ciphertext c = (a, (3) is decrypted as m = (3/a x . We use the property that given 
y = log ff a one can reconstruct to = (3/h v and hence a party encrypting m can 
prove knowledge of to by proving knowledge of y. 

The semantic security of the El Gamal scheme follows from the hardness of 
decisional Diffie-Hellman (ddh) in G. The El Gamal scheme is homomorphic 
relative to multiplication. I.e., if (a\,/3i) encrypts m\ and ( 02 , P 2 ) encrypts m 2 
then (aq • ct2,f3i ■ @ 2 ) encrypts TO 1 TO 2 . We additionally consider a modified ver¬ 
sion of El Gamal where the encryption is performed by choosing y *—r 7* q and 
computing (g v ,h v ■ g m ). Decryption of a ciphertext c = (a,b) is performed by 
computing g m = b ■ a~ x . The fact that to cannot be efficiently recovered is not 
problematic for the way El Gamal is incorporated in our protocols. Moreover, 
this variant of El Gamal is additively homomorphic and can be used to perform 
oblivious linear computations (e.g., polynomial evaluation) in the exponent. 
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2.3 Perfectly Hiding Commitment 

We use a perfectly-hiding commitment scheme (com, dec) with a zero-knowledge 
proof of knowledge 7t CO m for the relation 

7^-COM = (r, to)) | c = com (to; r)| , 

where com (to; r) denotes the commitment to a message to using random coins 
r. We instantiate com(-; •) with Pedersen’s commitment scheme [33], using the 
same underlying group G used for the El Gamal scheme. I.e., let q' = 2q + 1 
where q ', q are primes and let g, h be generators of the subgroup G of quadratic 
residues modulo q'. A commitment to to is then defined as com (to; r) = g m h r 
where r <— R Z 9 _i. The scheme is perfectly hiding as for every to, r, m' there 
exists a single r' such that g m h r = g m h r . The scheme is binding assuming 
hardness of computing log g h. However, given log g h, it is possible to decommit 
any commitment c into any message to G Z g . We instantiate 7t CO m with the 
proof of knowledge from [31] (this proof is not a zero-knowledge proof, yet can 
be modified using standard techniques usd- 

2.4 Zero-Knowledge Proofs 


Our protocols employ zero-knowledge proofs of knowledge for the following re¬ 
lations (in the following, G is a group of prime order): 


Type 

Protocol 

Relation/Language 

Reference 

ZKPK 

^DL 

= {((G ,g,h),x) | h = g x } 

[32] 

ZKPK 

^DDH 

T^ddh = {((G,g,gi, 5 2 ,ff 3 ),x) | gi = g x /\g 3 = 3 2 }} 

m 

ZK 

^NZ 

L ra =|(G. s .^(a,/3))|^"* 9 f A » ;3 '') = ^ 9m ) 

Sec. 12.41 


Zero-Knowledge Proof for L NZ . We use standard techniques for construct¬ 
ing a zero-knowledge proof for the language of encryptions (a, (3) of non-zero 
exponents of g: 

L N z = {(G, g , h , (a, /?)) | 3 (to ^ 0, r) s.t. a = g r A (3 = h r g m } . 

The construction is based on a zero-knowledge protocol 7r MULT for the language 
f 3 to,to'S s.t. ci,c 2 ,c 3 are 

Lmult = i (G,g,/l, Ci,C 2 ,C 3 ) r m m' mm ' 

( encryptions of g , g , g resp. 

ttmult is a modification of a protocol by Damgard and M. Jurik [5] designed for 
the Paillier encryption scheme. 


2.5 Balanced Allocation 

We employ a scheme for randomly mapping elements into bins, as suggested 
in [IT]. We use the balanced allocation scheme of 0 where elements are inserted 
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into B bins as follows. Let ho, hi : {0, -4 [B\ be two randomly chosen 

hash functions mapping elements from {0, into bins 1 ,,B. An element 

x e { 0 ,l} p M is inserted into the less occupied bin from {ho(x),h.i(x)}, where 
ties are broken arbitrarily. If m elements are inserted, then except with negligible 
probability over the choice of the hash functions ho, hi, the maximum number 
of elements allocated to any single bin is at most M = 0(m/B + loglog-B). 
Setting B = log ^ — we get that M = 0(log log ?7i)Q In the protocol we devi¬ 
ate insignificantly from the description above, and let Pi choose seeds for two 
pseudorandom functions, that are used as the hash functions ho, hi. 

2.6 Oblivious PRF Evaluation 

We use a protocol 7 t prf that obliviously evaluates a pseudorandom function in 
the presence of a malicious adversary. Let / PRP be the indexing algorithm for a 
pseudorandom function ensemble, and let k I PRP (l ra ) be a sampled key. The 
functionality F prf is defined as 

(k,x) (X,F PRF (k,x)). (1) 

The PRF may be instantiated with the Naor-Reingold pseudorandom function 
[25] with the protocol presented in [TB] (and proven in [21]). The function is 
defined as 

F PRF ((a 0 ,.. ■ ,a n ),x) = 3 a °n£=i°< [1 , 

where G is a group of prime order q, g is a generator of G, cq € and x = 
(x[l],. .. ,x\n\) € {0,1}". The protocol involves executing an oblivious transfer 
for every bit of the input x. Combining this with the fact that n oblivious 
transfers runs require lln + 29 exponentiations using the protocol in pM : (the 
analysis in [34] includes the cost for generating a common reference string), one 
gets a constant-round protocol that securely computes iF PRP in the presence of 
malicious players using a constant number of exponentiations for every bit of 
the input x. 

3 Secure Set Intersection 

We now consider the functionality of set intersection, where each party’s input 
consists of a set, and the size of the other party’s input set. If the set sizes 
match, then the functionality outputs the intersection of these input sets to Pi. 
Otherwise Pi is given _L. More formally: 

Definition 1. Let X and Y be subsets of a predetermined domain (w.l.o.g., we 
assume X,Y c {0, l} p ( n ) for some polynomial p() such that 2 p ( n ' > is 


1 A constant factor improvement is achieved using the Always Go Left scheme in [36] 
where ho : {0, l} p O) [1,... ; A], hi : {0, l} p O) [| -)- 1 An element x is 

inserted into the less occupied bin from {ho(x), hi(x)}; in case of a tie x is inserted 
into ho(x). 
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super-polynomial in n, and that the set elements can be represented as elements 
of some finite group), the functionality Tr\ is: 

((Y wv M / (X ny, A) if \X\=mx, \Y\ = m Y and X,FC{0, l} p(n) 
((X,. (Y,mx)y A) Otherwise 

In the rest of this section we present in detail our construction for a protocol 
realizing P n in the presence of malicious adversaries. For completeness we include 
a description of the protocol by Freedman et al. for semi-honest parties: 

Protocol 1. (set-intersection protocol secure in the presence of semi-honest 
parties): 

— Inputs: The input of P\ is my and a set X C {0, containing nix 

items; the input of P 2 is mx and a set Y C {0, containing my items. 

— Auxiliary inputs: A security parameter l n . 

— The protocol: 

1. Key setup: Pi chooses the secret and public keys ( sk,pk) for the un¬ 
derlying homomorphic encryption scheme (e.g., Paillier or El Gamal). 
She sends pk to P 2 ■ 

2. Setting the balanced allocation scheme: Pi computes the parame¬ 
ters B, M for the scheme and chooses the seeds for two (pseudo-)random 
hash functions ho, hi : {0, l} p ( n > —> [B\. She sends B, M, ho, hi to P 2 . 

3. Creating polynomials for the set X: For every x £ X, Pi maps x 
into the less occupied bin from {ho(x), hi(x)} (ties broken arbitrarily). 
Let Bi denote the set of elements mapped into bin i and let Qi(x) *= f 

0 Qi,j ' x ' ? denote a polynomial with the set of roots Bi. Pi encrypts 
the coefficients of the polynomials and sends the encrypted coefficients to 

Pi- 

f. Substituting in the polynomials: Let yi ,..., y mY be a random order¬ 
ing of the elements of set Y. P 2 does the following for alia £ { 1 ,..., my}: 

(a) He sets ho = ho(y a ), hi = hi(y a ). 

(b) He chooses two random elements in the underlying group of the ho¬ 
momorphic encryption scheme ?’o,?’i. He then uses the homomor¬ 
phic properties of the encryption scheme to compute an encryption 
of r Q ■ Q~ ho (y a ) + Va and ri ■ Q^ i (t/ a ) + y a . Both encrypted values are 
sent to Pi. 

5. Computing the intersection: Pi decrypts each received value. If the 
decrypted value is in X then Pi records as part of her local output. 

Note that, since the parties are semi-honest, Pi outputs X n Y with probability 
negligibly close to 1: (i) For elements y a £ X n Y we get that Qh 0 ( Va ){ya) = 0 
or Qhi(y a ) (.Vet) = 0, hence one of the corresponding encrypted values is y a itself, 
and Pl would record it in its local output, (ii) For y a qL X fl Y we get that 
Qh 0 (y a )(ya) 7 ^ 0 and Qhi(y a ){ya) 7 ^ 0 an d hence corresponding encrypted values 
are two random values ?’o + y and ri + y that fall within X with only a negligible 
probability. 
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Efficiency. The protocol runs in a constant number of rounds. The commu¬ 
nication costs are of sending the encrypted polynomials (BM values) and the 
encrypted r 0 • Qf l0 (y a ) + y a and n • Q^ya) + y a (2my values). Using the El 
Gamal or Paillier encryption schemes, the computation costs are of encrypting 
the polynomials ( 0(BM) exponentiations) and of obliviously computing the en¬ 
cryptions of tq ■ Q(y a ) + ya and r± ■ (ya) + ya ( 0(Mmy ) exponentiations). 
Overall, we get that the overall communication costs are of sending 0(mx + my) 
encryptions, and the computation costs are of performing 0 (mx + my log log n) 
modular exponentiations. 

3.1 Constructing a Protocol for Malicious Parties 

We note a couple of issues that need to be addressed in transforming the above 
protocol for semi-honest parties to a protocol for malicious parties: 

1. It is easy for P\ to construct the B polynomials such that it would learn 
about elements that are not in the intersection XC\Y. For instance, if Qi(-) is 
identically zero then Pi learns all elements {y € Y : h 0 (y ) = i or hi(y ) = i}. 
Similarly, if the sum of degrees of Qi ,..., Qb exceeds nix then Pi may learn 
about more than nix elements in P-Y s input. 

To resolve these problems we introduce a zero-knowledge protocol for 
verifying that Qi^k 0 for alii £ {1 ,..., B }, and deg(Qj) = mj. 

2. While party P 2 is supposed to send my pairs of encryptions resulting from 
substituting a value y (known to P 2 ) in the (encrypted) polynomials Qh 0 (y) 
and Qh ± (y) it may deviate from his prescribed computation. Thus, P 2 ’s in¬ 
put to the protocol may be ill defined. A solution suggested in solves 
this problem partially, as it deals with the case where each element y is sub¬ 
stituted in a single polynomial. This solution avoids the standard usage of 
zero-knowledge proofs by P 2 that it indeed followed the protocol. Instead, it 
enables party Pi to redo the entire computation supposedly carried out by 
P-2 on y and verify that its outcome is consistent with the messages received 
from P-2 (this is where the construction uses a random oracle). 

We remove the dependency on the random oracle and present a solution 
to the case where y is substituted in two polynomials. 


3.2 Checking the Polynomials 

Our set-intersection protocol utilizes a zero-knowledge proof of knowledge for 

the relational 

. . , , , I j Qi,j = Epk(Qi,j] r i,j ) A 

Ppoly = \({Qi,j}i,j,m x ,pk), I], deg (Qfy)) =m x A j 

V*, Qi(-) jk 0 


where i € {1,..., Bj, j € {0 ,... ,M}. 

2 We will use the convention that the degree of a polynomial Qi(-) can be chosen to be 
any integer j' such that Qij = 0 for all j > j 1 , hence equality with mx can always 
be achieved. 
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3.3 Secure Set-Intersection in the Presence of Malicious Adversaries 

We now get to the main contribution of this work a protocol that securely 
computes Pp in the presence of malicious adversaries, in the standard model; 
see Figure [T] The main ingredient is a subtle combination of an oblivious pseudo¬ 
random function evaluation protocol and a perfectly hiding commitment scheme. 
Somewhat counter intuitively, the oblivious prf need not be committed (in a 
sense that the same key is being reused) - the proof of security shows that al¬ 
though party P 2 that controls the key may change it between invocations, this 
does not get him any advantage. 

The oblivious PRF is used to save on using a (generic) zero-knowledge protocol 
for party P 2 ’s adherence to the protocol. Recall that in the protocol of Freedman 
et al. Pi learns for every y G Y two values: r 0 • Qh 0 ( y )(y) + U and ri ■ Qh 1 ( v ){y) + y 
where roP'i are randomly distributed. If y (jL X then both values are random, 
and reveal no information about y. If, y £ X then one of these values equals 
y. In our protocol, the ‘payload’ y of this computation is replaced by a secret s 
(meaning, Pi learns two values: tq ■ Qh 0 ( y )(y ) + s and ri • Qh 1 (y){y) + s )- The 
result of the polynomial evaluation step is, hence, that if y £ X then Pi learns 
no information about s, and if y £ X then Pi learns s. 

The crux of our construction is that the strings ro, ri (as well as other) are not 
really random. These are pseudorandom strings that are directly derived from 
P PRF (fc, s). What we get, is that if y qL X then Pi learns nothing about s or y. 
If, on the other hand, y € X then Pi learns s, and furthermore after Pi invokes 
the oblivious PRF protocol, she can recover y and check that the computations 
P 2 performed based on the other ‘random’ strings were performed correctly. 

A complication arises as P 2 (who selects the key k for the prf) computes 
P FRF (fc, s) by himself, and hence it is impossible for the simulator to extract 
s from this computation. We thus provide the simulator with an alternative 
means of extracting s (and also the corresponding y value) by having P 2 commit 
to both. To guarantee independence of inputs (i.e., that Pi would not be able to 
choose his inputs depending on P 2 ’s commitment or vise versa), this commitment 
is perfectly hiding and is performed before Pl sends the encrypted polynomials 
representing her input set X. 

We continue with a formal description of the protocol. 

Protocol 2. (7r n - secure set-intersection): 

— Inputs: The input of Pi is my and a set X C {0,1 containing mx 

items; the input 0 /P 2 is mx and a set Y C {0, containing my items 

(hence, both parties know mx and my). 

— Auxiliary inputs: A security parameter 1", a prime q' such that q' = 2q+l 
for a prime q. The group G is the subgroup of quadratic residues modulo q' 
and g is a generator ofG. 

— Convention: Both parties check every received ciphertext for validity (i.e, 
that it is in G), and abort otherwise. 
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Pi(X, m y ) 


Pz{Y = {je}« e {l...my},mx) 


come = h Va g Sa 

For all a G {1... my} : 

_ Sql ^-/J Zg 


<r- ZKPOK 7Tcom 



encrypted Qi(-),..., Qb{■ 

l 

Qi{-) • • ■ Qb(-) —> 

| TTPOLY | 

i — encrypted Qi(-), ■ ■ ■, Qb(-) 
—> output 



If output = 0, abort 



Otherwise : 
k i — /pRp(l n ) and 

For all a 6 {1 ... my} : 
ro||n||fo||fi = FpRp(Me) 
qo = ro ■ Qho(y a )(y*), 

q\ =■ ri ■ Qh!(v a )(y<*)- 


e° = E pk ((se) 2 • g qo );ro] 
ei = E pk ((s a f-g qi );h] 

1 

I 

For all a 6 {1 ... my} ■ 
zl = D sk {e° a ) 
zi = D sk (ei) 

Check if 3 a: 6 A' and 
root p of z°, z\ s.t. 
come = h x ■ g p 



P —> 

Fprp(/c, p) <— 

I 7Tprf | 

«— k 

rol|ri||fo||fi = F PRF (k,p) 



Check if e°, ef , 
consistent with x 




Fig. 1 . A high-level diagram of 7Tn 


The protocol: 

1. Key setup for the encryption and commitment schemes: Pi 

chooses t,t' <— r Z q , sets h = g 1 ,h! — g l and sends h,h' to P 2 . The 
key for the Pedersen commitment scheme is h. The public and private 
keys for the El Gamal scheme are pk = h! and sk = t'. P± proves knowl¬ 
edge of log 0 h and log 9 h' using the zero-knowledge proof of knowledge 
for P DL . 
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2. Setting the balanced allocation scheme: P\ computes the parame¬ 
ters B, M for the scheme and chooses the seeds for two (pseudo-)random 
hash functions ho, hi : {0, l} p ( n ) —> [B\. She sends B, M, ho, hi to P 2 that 
checks that the parameters B,M were computed correctly, and aborts oth¬ 
erwise. 

3. P 2 commits to his input set: Let yi,..., y my be a random ordering of 
the elements of set Y. For alia £ {1,..., my}, P 2 chooses s a <—r and 
sends com a = co m(y a ; s a ) = h Va g Sa to Pi. P 2 then proves the knowledge 
ofy a and s a by invoking the zero-knowledge proof of knowledge for 1 Z CO m- 

4- Pi Creates the polynomials representing her input set: For every 
x £ X, Pi maps x into the less occupied bin from {/io(aO, hi(x)} (ties 
broken arbitrarily). Let Bi denote the set of elements mapped into bin 
i. Pi constructs a polynomial Qi(x) YljLo Qi,j ’ xJ °f degree at most 
M whose set of roots is B jH Pi encrypts the polynomials’ coefficients, 
setting qij = E p k{g® i ’ :i and sends the encrypted coefficients to P 2 . 

5. Checking the polynomials: Pi and P 2 engage in a zero-knowledge ex¬ 
ecution 7Tp OLY for which Pi proves that the sets 

were computed correctly, using its witness 

{Qi,ji ■ If the outcome is not 1 then P 2 aborts. 

6. Evaluating the polynomials: P 2 chooses k <— I PRF (1"). Then, P 2 
performs the following for all a £ {l,..., my}: 

(a) P 2 computes F PRF (k, s a ) and parses the result to obtain pseudoran¬ 
dom strings ro, ri, fo, fi of appropriate lengths for their usage below 
(i.e., r 0 ||ri||f 0 ||fi = F PRF (k,s a )). 

(b) He sets ho = ho(y a ) and hi = hi(y a ). 

(c) He uses the homomorphic properties of the encryption scheme 
to evaluate e° a = E p k(((s a ) 2 mod q 1 ) ■ g r ° ® h o^ Va ^;ro) and = 
E p k(((s a ) 2 mod q')-g ri ® hl ^ Va ' ) - fi) (where ro,fi denote here the ran¬ 
domness used in the re-encryptions). Then he sends e°,e^, to Pi. 

7. Computing the intersection: For each a £ {1, ... ,myj: 

(a) Pi computes zf = -D s fc(e°) and z ^ = D s k(ef). For each of the (up to 
four) roots p of z°,z), (roots are computed modulo q' = 2q+l and the 
result is considered only if it falls within Z q ), she checks if com a /g p 
coincides with h Xa for some x a £ X (this can be done efficiently by 
creating a hash table for set {h x : x £ X}), and if this is the case 
sets s a to the corresponding root and marks a. 

(b) Pi and P 2 engage in an execution of the protocol for F prf . If a is 

marked, then Pi enters s a as input, and otherwise she enters a zero. 
P 2 enters k as input. Let ^ 0 11 11 ^ 0 11 de-note Pi’s output from this 

execution. 

3 If Bi = 0 then Pi sets Qi(x ) = 1. Otherwise, if \Bi\ < M then P\ sets the M+l — \Bf\ 
highest-degree coefficients of Qi(-) to zero. 
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(c) If a is marked, then Pi checks that e° =E p k((s a ) 2 -g r °' t ^ h o^ x o:'>^ Xa ^-r' 0 ), 
and e* = E p k((s a ) 2 ■ g r rQh 1 ^ a )G a ) f') resu n from applying the ho¬ 
momorphic operations on the encrypted polynomials and randomness 
r ' 0 , r'i. If all checks succeed P\ records x a as part of her output. 

A word of explanation is needed for the computation done in Step [Gl A natural 
choice for the payload is s a itself. However, s a £ 7L q whereas the message space 
of the El Gamal encryption is G. Noting that 7L q C Z*, (neglecting 0 £ Z g ), and 
that by squaring an element of Z g / we get an element of G, we get that treating 
s a as an element of Z*, and computing (s Q ) 2 mod q' we get an element of G. 
In this mapping, (up to) two elements of Z g share an image in G, and hence in 
Step [7a] we need to recover and check both pre-images. 

Before getting into the proof of security, we observe that if both parties are hon¬ 
est, then Pi outputs X D Y with probability negligibly close to one. In this case, if 
for an element y a £ Y is holds that y a £ X then one of Qh 0 (y a )(ya), Qhi(y a )(y°i) 
is zero, and otherwise none of Qh 0 (y a )(]Ja), Qhi{y a ){ya) is zero. We get: 

1. If y a £ X D Y then one of e°,e* encrypts (s a ) 2 mod q'. Hence, there exists 
a root p of Za> z a such that com a /g p coincides with h Xa for some x a £ X, 
resulting in Pi marking a. Furthermore, as ro,ri,fo,ri are derived from 
F PRF (k,s a ), the check done by Pi in Step [7] succeeds and Pi records y a in 
her output. 

2. If y a X (~l Y then none of e°,e* encrypts (s a ) 2 mod q' and (except for 
a negligible probability) com a /g p coincides with h Xa for no root p of z^.,z^ 
and x a £ X. Hence, Pi does not mark a, and y a is not considered in Step [7] 
and not included in Pi’s output. 

Theorem 1 . Assume that 7 t dl , 7 t prp and 7 t poly are as described above, that 
(G, E, D) is the El Gamal encryption scheme, and that com is a perfectly-hiding 
commitment scheme. Then 7r n (Protocol H[) securely computes Pn m the presence 
of malicious adversaries. 

Efficiency. We first note that the protocol is constant round (as all its zero- 
knowledge proofs and subprotocols are constant round). The costs of using cur¬ 
rent implementations of P PRF on inputs of length p{n) is that of p{n) oblivious 
transfer invocations [2T], and hence of 0(p(n)) modular exponentiations. We get 
that the overall communication costs are of sending 0(mx + rriYp{n)) group el¬ 
ements, and the computation costs are of performing 0(mx +mv(loglogTOjs: + 
p(n))) modular exponentiations. 

Optimizations. Note first that if the functionality is changed to allow P 2 learn the 
size of the intersection mxnY, then, in Step[7bl it is possible to avoid invoking 7 t prp 
when a is not marked. This yields a protocol where 0(mx + m.Yny • p(n)) group 
elements are sent, and 0(mx + my • log log mx + mxnY • p{n)) exponentiation are 
computed. When mxnY my, this protocol is significantly more efficient than 
those suggested in [21] for weaker adversarial models. Furthermore, an improved 
scheme for oblivious pseudorandom function evaluation with overall complexity 
which is independent of the input length yields a better efficiency as well. 
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3.4 A Very Efficient Heuristic Construction 

Note that we can now modify protocol 7 r n to get a protocol in the random oracle 
model 7 Tp° by performing the following two changes: ( 1 ) the computation of 
E PRF (fc. s a ) performed by P -2 in Step|Sa]of 7 r n is replaced with an invocation of 
the random oracle, i.e., P -2 computes 'H{s ol )\ and (ii) the execution of the secure 
protocol for evaluating F prf by Pi and P 2 in Step [7] of 7 r n is replaced with an 
invocation of the random oracle by Pi, i.e., no communication is needed, and 
instead of providing s' to the protocol for P prp , Pi computes 74 (s'). 

A typical proof of security in the random oracle model relies on the simulator’s 
ability to record the inputs on which the random oracle is invoked, and the 
recorded information is used by the simulator for malicious P 2 while recovering 
his input. In other words, the proof of security relies on the property of the 
random oracle that the only way to learn any information about 74 (s) is to 
apply 74 on a well defined input s. Should 7 r^ 0 be implemented such that the 
invocations of the random oracle are replaced by a concrete computation of some 
function, it seems that this proof of security would collapse, even if very strong 
hardness assumptions are made with respect to this implementation. 

Nevertheless, the situation in protocol 7r n is very different. Note, in particular, 
that the simulator for malicious P 2 cannot monitor P 2 ’s input to P PRF (nor is this 
notion of inputs to the function well defined). Instead, the simulator extracts s 
from the zero-knowledge proof of knowledge for the commitment on P 2 ’s inputs 
in Step Oof 7Tn- This is inherited by the modified protocol 71 ^°. Hence, should 
the random oracle calls in 7Tj2j° be replaced with some primitive Gen, the proof of 
security may still hold with small modifications, given the hardness assumption 
on Gen (intuitively, some functions of the outcome of Gen(s) and s should look 
random). 7Tp° can hence be viewed as an intermediate step between the protocol 
in m that utilizes a random oracle to cope with malicious parties, and the 
protocol suggested in the current paper. If the primitive Gen is realized efficiently 
(e.g., if its computation incurs a constant number of exponentiations), we get 
an extremely efficient protocol for J- n , where the communication costs are of 
sending 0(mx + my) group elements, and the number of exponentiations is 
0(mx + mi-loglogm.Y). 

For the sake of completeness we include a formal description of protocol 7Tp en , 
that is identical to protocol 7r n except for the replacing every invocation of 
F PKF (k. •) by a computation of Gen(-). Note that unlike in an invocation of 
F PKF (k. •), no communication is needed for computing Gen(-). 

Protocol 3. (7Tp en - secure set-intersection with a “generator”): 

— Inputs: The input of Pi is my and a set X C {0, containing nix 

items; the input of P 2 is nix and a set Y C {0, containing niy items 

(hence, both parties know nix and my). 

— Auxiliary inputs: A security parameter 1™, a prime q' such that q' = 2q+l 
for a prime q. The group G is the subgroup of quadratic residues modulo q' 
and g is a generator ofG. 
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— Convention: Both parties check every received ciphertext for validity (i.e, 
that it is in G), and abort otherwise. 

— The protocol: 

1. Key setup for the encryption and commitment schemes: P\ 

chooses t,t' 'Lq, sets h = g l ,h! = g 4 and sends h,h! to P 2 . The 
key for the Pedersen commitment scheme is h. The public and private 
keys for the El Gamal scheme are pk = h! and sk = t'. P\ proves knowl¬ 
edge of log s h and log s h' using the zero-knowledge proof of knowledge 
for n Dh . 

2. Setting the balanced allocation scheme: Pi computes the parame¬ 
ters B, M for the scheme and chooses the seeds for two (pseudo-)random 
hash functions ho, hi : {0, l} p ( n ) —> [B\. She sends B, M, ho, hi to P 2 that 
checks that the parameters B, M were computed correctly, and aborts oth¬ 
erwise. 

3. P 2 commits to his input set: Let yi ,..., y mY be a random ordering of 

the elements of set Y. For alia £ {1, ..., my}, P 2 chooses s a and 

sends com a = co m(y a ; s a ) = h Va g Sa to Pi. P 2 then proves the knowledge 
°f Da and s a by invoking the zero-knowledge proof of knowledge for lZ COM . 

4- Pi Creates the polynomials representing her input set: For every 
x £ X, Pi maps x into the less occupied bin from {ho(x),hi(x)} (ties 
broken arbitrarily). Let Bi denote the set of elements mapped into bin 
i. Pi constructs a polynomial Qi(x) Qi,j ' x ' 5 °f degree at most 

M whose set of roots is BPi encrypts the polynomials’ coefficients, 
setting qij = E p k{g^ iJ ',rij), and sends the encrypted coefficients to P 2 . 

5. Checking the polynomials: Pi and P 2 engage in a zero-knowledge ex¬ 
ecution 7Tp OLY for which Pi proves that the sets {qi,j}i£{i,...,B},je{o,...,M} 
were computed correctly, using its witness {Qij, 

If the outcome is not 1 then P 2 aborts. 

6. Evaluating the polynomials: For all a £ {1 ,...,my} P 2 performs 
the following : 

(a) He sets ho = ho(y a ) and hi = hi(y a ). 

(b) He parses Gen(s Q ) to obtain pseudorandom strings ri,r 2 ,fo,ri °f ap¬ 
propriate lengths for their usage below (i.e., ri||r 2 ||fo||fi = Gen(s ct ) / ). 

He uses the homomorphic properties of the encryption scheme 
to evaluate e° = £ p t(((s a ) 2 mod q') ■ g r ° '^ 0 ^“); f 0 ) and e* = 
E p k{((s a ) 2 mod q l )-g' 1 ' < ^ h i^ Vo ‘^;fi) (wherero,fi denote here the ran¬ 
domness used in the re-encryptions). Then he sends e°,e* to Pi. 

7. Computing the intersection: For each a £ {1, ... ,myj: 

(a) Pi computes = D s k(e° a ) and z * = Dskleff). For each of the (up to 
four) roots p of z Q a , z * (roots are computed modulo q' = 2q+l and the 

4 If Bi = 0 then Pi sets Qi(x ) = 1. Otherwise, if \Bi\ < M then P\ sets the M +1 — \Bi\ 

highest-degree coefficients of Qi(-) to zero. 
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result is considered only if it falls within Z q ), she checks if com a /g p 
coincides with h Xa for some x a € X (this can be done efficiently by 
creating a hash table for set {h x : x € X}), and if this is the case 
sets s a to the corresponding root and marks a. 

(b) If a is marked, then P\ parses Gen(s a ) to obtain r' 0 , r[, r' 0 , r[. 

(c) Pi checks that ea=E p k{{s a ) 2 -g r °'Q h o , - x °‘'>( Xa }-,r' 0 ), ande] x = E p k((s a ) 2 - 
g r i-Qh 1 (x a )(^o,). p ^j result from applying the homomorphic operations 
on the encrypted polynomials and randomness Tp,?^. If all checks 
succeed P\ records x a as part of her output. 
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Abstract. This paper presents an efficient protocol for securely com¬ 
puting the fundamental problem of pattern matching. This problem is 
defined in the two-party setting, where party Pi holds a pattern and 
party P 2 holds a text. The goal of Pi is to learn where the pattern 
appears in the text, without revealing it to P 2 or learning anything else 
about P 2 ’s text. Our protocol is the first to address this problem with full 
security in the face of malicious adversaries. The construction is based 
on a novel protocol for secure oblivious automata evaluation which is of 
independent interest. In this problem party Pi holds an automaton and 
party P 2 holds an input string, and they need to decide if the automaton 
accepts the input, without learning anything else. 

1 Introduction 

Secure two-party computation is defined as the joint computation of some func¬ 
tion over private inputs using a communications protocol, satisying at least pri¬ 
vacy (no other information is revealed beyond the output of the function) and 
correctness (the correct output is computed). Today’s standard definition (cf. 
[I] following (2|3|i| ) formalizes security by comparing the execution of such a 
protocol to an “ideal execution” where a trusted third party helps the parties 
compute the function. Specifically, in the ideal world the parties just send their 
inputs over perfectly secure communication lines to a trusted party, who then 
computes the function honestly and sends the output to the designated party. 
Informally, the real protocol is defined to be secure if all adversarial attacks on 
a real protocol can also be carried out in the ideal world. In the ideal world, 
the adversary can do almost nothing, and this guarantees that the same is also 
true in the real world. This definition of security is often called simulation-based 
because security is demonstrated by showing that a real protocol execution can 
be “simulated” in the ideal world. 

Secure two-party computation has been extensively studied, and it is known 
that any efficient two-party functionality can be securely computed mm- How¬ 
ever, these are just feasibility results that demonstrate secure computation is 
possible, in principle, though not necessarily in practice. One reason is that the 
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results mentioned above are generic, i.e. they do not exploit any structural prop¬ 
erties of the specific function being computed. A long series of research efforts 
has been focused on finding efficient protocols for specific functions: constructing 
such protocols is crucial if secure computation is ever to be used in practice. 

Our Contribution. In this paper we focus on the following problems: 

— Secure Pattern Matching. We look at the basic problem of pattern matching. 
In this problem, one party holds a text T and the other party holds a pattern 
p , but |T| and \p\ are mutually known. The aim is for the party holding the 
pattern to learn all the locations of the pattern in the text (and there may 
be many) while the other party learns nothing about the pattern. 

— Oblivious Automata Evaluation. To solve the above problem we consider the 
approach of [5] which reduces the pattern matching problem to the com¬ 
position of a pattern-specific automaton r with the text T. We develop a 
protocol for securely computing the evaluation of r on T. 

The problem of pattern matching has been widely studied for decades due to its 
numerous applications. Yet, the problem of pattern matching in a secure setting 
has not received similar attention. Our starting point is an extremely efficient 
protocol that computes this function in the “honest-but-curious” setting^] This 
solution can be extended for one-sided simulation, or security even with cor¬ 
ruption of the party with the pattern. This first protocol is independent of out 
protocol for the malicious setting, and is comparable to the one-sided simulat- 
able protocol of [9]. However, while both protocols reach the same asymptotic 
complexity our protocol is much more practical since the concrete constants that 
are involved are much smaller ([5j requires \p\ oblivious transfers). Moreover, our 
protocol can be easily extended to address related problems such as approximate 
text search or text search with wildcards. The malicious setting introduces many 
subtleties beyond those considered in the previous settings and requires the use 
of a different technique. This includes the introduction of novel sub-protocols 
such as, for example, a protocol to prove that a correct pattern-specific automa¬ 
ton was constructed. We note that our protocols are the first efficient ones in 
the literature to achieve full simulatability for these problems with malicious 
adversaries. Security is based on the El Gamal encryption scheme PLOITT] and 
thus requires a relatively small security parameter (although any additively ho¬ 
momorphic threshold encryption with secure two-party distributed protocols to 
generate shared keys and perform decryptions would work). 

Motivation. Consider a hospital holding a DNA database of all the participants 
in a research study, and a researcher wanting to determine the frequency of the 
occurrence of a specific gene. This is a classical pattern matching application, 
which is however complicated by privacy considerations. The hospital may be 


1 In this setting, an adversary follows the protocol specification but may try to examine 
the messages it receives to learn more than it should. 
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forbidden from releasing the DNA records to a third party. Likewise, the re¬ 
searcher may not want to reveal what specific gene she is looking for, nor trust 
the hospital to perform the search correctly. 

It would seem that basic honest-but-curious solutions (already present in the 
literature, see below) would work here. However, the parties may be motivated 
to produce invalid results, so a proof of accurate output might be as important 
as the output itself. Moreover, there is also a need to make sure that the data 
on which the protocol is run is valid. For example, a rogue hospital could sell 
“fake” DNA databases for research purposes. Perhaps some trusted certification 
authorities might one day pre-certify a database as being valid for certain ap¬ 
plications. Then, the security properties of our protocol could guarantee that 
only valid data is used in the pattern matching protocol. (The first step of our 
protocol is for the hospital to publish an encryption of the data, this could be 
replaced by publication of encrypted data that was certified as correct.) 

Related work. The problem of secure pattern matching was studied by Hazay 
and Lindell in [5] who used oblivious pseudorandom function (PRF) evaluation 
to evaluate every block of size m bits. However, their protocol achieves only a 
weaker notion of security called one-sided simulatability which guarantees pri¬ 
vacy in all cases and requires that one of the two parties is never corrupted 
to guarantee correctness. It is tempting to think that a protocol for computing 
oblivious PRF evaluation with a committed key (where it is guaranteed that the 
same key is used for all PRF evaluations) for malicious adversaries [T2] suffices 
for malicious security. Unfortunately, this is not the case since the inputs for the 
PRF must be consistent and it is not clear how to enforce this. Namely, for every 
i, the last to — 1 bits of the ith block are supposed to be the first m — 1 bits of the 
following block. The idea to use oblivious automata evaluation to achieve secure 
pattern matching originates in |13| . Their protocols, however, are only secure in 
the honest-but-curious setting. We improve their results to tolerate a malicious 
adversary. 

The efficiency of our protocol. When presenting a two-party protocol for the 
secure computation of a specific function, one has to make sure that the re¬ 
sulting protocol is indeed more efficient than the known “generic” solutions for 
secure two-party computation of any function. We compare our protocols to the 
two most efficient generic two-party protocols secure against malicious adver¬ 
saries, both based on the circuit garbling-technique by Yao [5]. Recall that Yao’s 
protocol (which is secure against semi-honest players) uses a Boolean circuit to 
compute the fuction, and its computational complexity is linear in the size of 
the circuit. 

— One result, in [33], to make Yao resistant to malicious adversaries uses a 
binary cut-and-choose strategy. This requires running s copies of Yao’s pro¬ 
tocol, where s is a statistical security parameter that must be large enough 
so that 2 * 2~ is sufficiently small. This requires Ofs\C\ + s 2 m) symmetric- 
key encryptions, and this communications overhead, as shown by [T5] , is a 
major obstacle. 
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— The other general result from [i TB] uses a special form of encryption for gar¬ 
bling and performs efficient zero-knowledge (ZK) proofs over that encryption 
scheme. The protocol requires a common reference string (CRS) which con¬ 
sists of a strong RSA modulus. To the best of our knowledge, there are 
currently no efficient techniques for generating a shared strong RSA modu¬ 
lus without incorporating external help. Furthermore, their protocol requires 
approximately 720 RSA exponentiations per gate where these operations are 
computed modulo 2048 due to the use of Paillier’s encryption scheme. This 
means that the bandwidth of [TB] is relatively high as well. 

2 Tools and Definitions 

Throughout the paper, we denote the security parameter by n. Although not 
explicitly specified, input lengths are always assumed to be bounded by some 
polynomial in n. A probabilistic machine is said to run in polynomial-time (ppt) 
if it runs in time that is polynomial in the security parameter n alone. 

A function /r(-) is negligible in n (or simply negligible ) if for every polyno¬ 
mial p(-) there exists a value N such that p(n) < for all n > N; i.e., 
p{n) = Let X = {A(n)} ngJVa6{01} , and Y = {T(n)} n£jVj „ e{0)1} » be 

distribution ensembles. We say that X and Y are computationally indistinguish¬ 
able , denoted X = Y, if for every polynomial non-uniform distinguisher D there 
exists a negligible p(-) such that 

|Pr[D(X(n, a)) = 1] — Pr [D(Y(n, a)) = 1]| < p(n) 
for every n £ N and a £ {0,1}*. 

Due to space considerations we defer the formal definitions for two-party 
secure computations in the presence of malicious adversaries and one-sided sim¬ 
ulation security to the extended version of this paper. 

Zero-knowledge proofs. Our protocols use the several standard zero-knowledge 
proofs, as summarized in Table [H We also employ the following additional zero- 
knowledge proofs. 

1. A zero-knowledge proof of knowledge 7 t enc for the following relation: Let 
Ci = [cjp,..., c* >m ] for i £ {0,1} and C' = [c},be three vectors of 
to ciphertexts each. We want to prove that C is the “re-encryption” of the 

Table 1. Zero knowledge proofs referenced by our protocols 


Protocol 

Relation/Language 

Reference 

TTdl 

7£dl = {((G ,g,h),x) h = g x } 

E3 

TTdDH 

TIddh = {((G, g, gi,g 2 ,g 3 ),x) \ gx = g x A g 3 = g 2 }} 

m 

7Tnz LnZ 

= {(G, g, h, (a, /3}) \ 3 (m 7^ 0, r) s.t. a = g r , (3 = h r g m 

} urn 
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same messages encrypted in either Co or C i, or, in other words, that there 
exists an index * £ {0,1} such that for all j, c' was obtained by multiplying 
Ci,j by a random encryption of 0. More formally, 

Tv-ENC = 

{(G,g,m,Co,Ci,C'),{i,{rj}j)\ s.t. for all j : c'- = Cjj -g E pk (0;rj)} . 

This involves the parties computing the sets Co = {n(co, J - G (i/c'))^}° =1 

and ci = {m*j ’G (l/c'-)) riJ }J?_ 1 , where the sets {ro,j}j and {rij}j are 
public randomness. The prover then proves that either ( pk , co) or (pk, Ci) is 
a Diffie-Hellman tuple. 

2. Let C = and C' = {c' :/ } ? y be two sets of encryptions, where 

j € {1,..., \Q\} and i £ {0,1}. Then we consider a zero-knowledge proof 
of knowledge 7t perm for proving that C and C 1 correspond to the same de¬ 
cryption vector up to some random permutation. Meaning that, 

I^perm = | (pk, C , C"), (tt, |v i, {c jt i = i ■ E pk ( 0; r jVi )}j} 

where 7r is a random one-to-one mapping over the elements {1,...,|Q|}. 
Basically we prove that C 1 is obtained from C by randomizing all the ci¬ 
phertexts and permuting the indices (i.e., the columns). We require that the 
same permutation is applied for both vectors. The problem in which a single 
a vector of ciphertexts is randomized and permuted is defined by 

^"PERM ^ ^ ( C 1 J • ■ • ) C Q ) ) (2l, . . . , CQ ) , pk^j , 

(tt, (ri,...,r Q ))|V i,Cj = c nU) ■ E pk (0; rj)} ■ 

and has been widely studied. The state-of-the-art protocol is in [2D]. We 
use a simpler, though slightly less efficient (but still good for our purposes) 
protocol, from [2T], which is an efficient zero-knowledge proof 7Tp ERM for 
KJ erm with linear computation and communication complexity and constant 
number of rounds. We use this slightly less efficient protocol because its proof 
proof applies to the case where the same permutation is applied to multiple 
vectors of ciphertexts. 

3 Secure Text Search Protocols 

Text search involves scanning a text sequentially, looking for instances of a par¬ 
ticular pattern. Efficient text search requires analysis of the pattern string to 
enable 0(1) scanning that skips over regions of text whenever possible matches 
are provably not possible, as in KMP [5] which we employ here. 

Pattern matching is defined as follows: given a binary string T of length (. and 
a binary pattern p of length m , find all the locations in the text where pattern 
p appears in the text. Stated differently, for every i = 1, — m + 1, let Ti 
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be the substring of length m that begins at the ith position in T. Then, the 
basic problem of pattern matching is to return the set {i \ Ti = p). Formally, we 
consider the functionality P PM defined by 



Note that P 2 , who holds the text, learns nothing about the pattern held by Pi, 
and the only thing that Pi learns about the text held by P 2 is the locations 
where its pattern appears. Our starting point is an extremely efficient protocol 
that computes P PM in the “honest-but-curious” setting, where the adversary 
follows the protocol specification but tries to gain useful information about the 
honest party’s input. 

3.1 “Honest-But-Curious” Secure Text Search 

The protocol shown in Fig. |T]uses homomorphic encryption to ensure the privacy 
of the two parties’ inputs. Informally, party Pi computes a matrix <P of size 
2 x to that includes an encryption of zero in position (i. j) if pj = i or an 
encryption of one otherwise. Given party Pi creates a new encryption e k 
for every text location k that corresponds to the product of the encryptions at 
locations ( tk+j-i,j ) for all j £ {1,...,to}. Since e k is the Hamming distance 
between p and T k , iff p matches T k , e k will be a random encryption of zero. We 
remark that even though we consider here the problem of exact matching, this 
solution can be easily applied to the problem of text search with mismatches, or 
for larger alphabets. 

Formally, 

Protocol 1. 7 t simple 

— Inputs: The input of Pl is a binary search string p = pi ,..., p m and Pi a 
binary text string T = t \,..., tt 


Pi(p) 


P*(T) 


For all i £ {0,1}, 
j £ {l,...,m} : 

<P{i,j) = E pk { 0) for i = pj 
<P(l-i,j) = E pk (l) 


For all k £ {1,..., : 

e 'k = II T=o ftitk+jJ) r 


output {k | D s k(e k ) = 0} 


Fig. 1. Text search in the honest-but-curious setting 






338 R. Gennaro, C. Hazay, and J.S. Sorensen 


— Conventions: The parties jointly agree on a group G of prime order q 
and a generator g for the El Gamal encryption. Party Pi generates a key 
pair (pk, sk ) <— G and publishes pk. Finally, unless written differently, j £ 
{1,..., to} and i € {0,1}. 

The protocol: 


1 . 


Encryption of pattern. Party Pl builds a 2 x to matrix of ciphertexts 
^ defined by, 


^(bj) 


E pk (0;r) pj = i 
E p k(l',r) otherwise 


where each r is a uniformly chosen random value of appropriate length. 
The matrix <P is sent to party P 2 . 

2. Scanning of text. For each offset k £ — to+1}, P 2 computes 


e k = i,j) 

i=1 

Then for each offset k, it holds that T k matches pattern p if and only if 

3. Masking of terms. Due to the fact that the decryption of ek reveals 
the number of matched elements at text location k, party P 2 masks 
this result through scalar multiplication. In particular, P 2 sends the set 
{e' k = ( e kY k }k where r k is a random string chosen independently for 
each k. 

4. Obtaining result. Pi uses sk to decrypt the values of e' k and obtains 

{k | D ak (e' k )= 0} 


Clearly, if both parties are honest then Pi outputs a correct set of indexes with 
overwhelming probability (an error may occur with negligible probability if ( e k ) r 
is an encryption of zero even though e k is not), as the parties execute a naive 
solution for P PM . Then we state the following, 

Theorem 1. Assume that (G,E,D) is the semantically secure El Gamal en¬ 
cryption scheme. Then protocol 7 t simple securely computes P PM in the presence 
of honest-hut- curious adversaries. 

The proof is straightforward via a reduction to the security of (G, E, D) and is 
therefore omitted. 

Furthermore, if party Pi proves that it computed matrix correctly, we 
can also guarantee full simulation with respect to a corrupted Pl. This can 
be achieved by having Pl prove, for every j, <?(0, j), <2>(1, j) is a permuted pair 
of the encryptions E pk ( 0), E pk { 1), using 7r PERM . Constructing a simulator for the 
case of a corrupted P 2 is more challenging since the protocol does not guarantee 
that P 2 computes {e' k }k relative to a well defined bit string p. In particular, it 
may compute every encryption e' k using a different length to string. Thus, only 
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privacy is guaranteed for this case. Let 7 t' imple denote the modified version of 
■^simple with the additional zero-knowledge proof of knowledge 7 t perm of P\ . We 
conclude with the following claim, 

Theorem 2. Assume that (G,E,D) is the semantically secure El Gamal en¬ 
cryption scheme. Then protocol 7t' imple securely computes J- PM with one-sided 
simulation. 

The proof sketch is in the extended version of this paper. 

Efficiency. We first note that the protocol 7 Ts IMPLE is constant round. The overall 
communication costs are of sending 0(m +1) group elements, and the computa¬ 
tion costs are of performing 0(m + £) modular exponentiations, as Pi sends the 
table T> and P 2 replies with a collection of £ encryptions. The additional cost of 
ttperm is linear in the length of the pattern. 

The fact that this protocol does not seem to be naturally extendable for the 
malicious setting has led us to search for the techniques presented in the next 
section. 

3.2 Secure Text Search in the Presence of Malicious Adversaries 

We consider a secure version of the KMP algorithm [5] , that searches for occur¬ 
rences of a pattern p within a text T by employing the observation that when a 
mismatch occurs, the pattern itself embodies sufficient information to determine 
where the next match could begin, thus bypassing re-examination of previously 
matched characters. More formally, Pi, whose input is a pattern p , constructs an 
automaton P p as follows: We denote by p^ the length-i prefix pi,... ,pi of p. Pi 
first constructs a table Y with m entries where its ith entry denotes the largest 
prefix of p that matches a suffix of P(i-i). This table (as in Fig. [2j maintains 
the appropriate partial match state if a mismatch occurs in the ith bit of p. The 
algorithm indicates for each bit of the text, every input that reaches the final 
state, or one bit for each text location. 


State 

Prefix 

r(<u) 

Fail 

State 

r{q. 

! j = 0 

i,i) 

91 



91 

91 

92 

92 

1 


91 

C(91.0) 

93 

93 

11 

1 

92 

94 

C(92.1) 

94 

110 


91 


C(«i.l) 

95 

1100 


91 

96 

r(v 1 , 1 ) 

96 

11000 


91 

r(9i,o) 

97 

97 

110001 

1 

92 

C(<12.0) 

98 

98 

1100011 

11 

93 

99 

3.1) 

99 

11000110 

110 

94 

910 

C(94.1) 

910 

110001100 

1100 

95 

r(i 5 , 0 ) 

911 

911 

1100011001 

1 

92 

C(92.0) 

C(92,l) 



Fig. 2. Construction of determinized KMP automaton for pattern 1100011001 
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We remark that T can be easily constructed in time 0(m 2 ) by comparing 
p against itself at every alignment. Pi constructs its automaton P p based on 
T. It first sets \Q\ = \p\ + 1 and constructs the transition table A as follows. 
For all i £ {1,..., m}, A(qi _i x pi) —> qi and x (1 — p,)) —> T(i) where 

T(i) denotes the ith entry in T (we denote the labels of the states in Q by the 
sequential integers starting from 1 to m + 1. This way, if there is no matching 
prefix in the ith entry, the automaton goes back to the initial state q\). P\ 
concludes the construction by setting F = q m . 

In the next section we show a general protocol evalue to perform a secure and 
oblivious evaluation of Pi’s automaton on P 2 ’s text. The protocol works for any 
automaton (not just a KMP one) and therefore may be of independent interest. 
After showing the automata evaluation protocol, we also show how to prove in 
zero-knowledge that the automaton Pi constructs is a correct KMP automaton. 

3.3 Secure Oblivious Automata Evaluation 

In this section we present a secure protocol for oblivious automata evaluation in 
the presence of malicious adversaries. In this functionality Pi inputs a description 
of an automaton P, and P 2 inputs a string t. The result of the protocol is that 
Pl receives P(i), while P 2 learns nothing. Formally, we define this problem via 
the functionality 



(accept, A) if P(t) £ F 
(no — accept, A) otherwise 


T t 


'auto • 


where A is the empty string (denoting that P 2 does not receive an output) and 
r(t) denotes the final state in the evaluation of P on t. For a binary inpuy, the 
transition table contains \Q\ rows and two columns. Furthermore, we assume that 
the names of the states are the integers {1,...,|<3|}. For simplicity, we assume 
that \Q\ and |P| are public. This is due to the fact that this information is public 
anyway when reducing the problem of pattern matching into oblivious polynomial 
evaluation. For the sake of generality we note making |P| private again can be 
easily dealt by having Pi send a vector of encryptions for which the ith encryption 
is a zero encryption only if qi (j F. Otherwise, it is an encryption of qi (this can be 
verified using a simple zero-knowledge proof). The final verification can be done 
by checking membership in this set using techniques from below. 

Recall that our starting point is the protocol from [T3|. Their idea is to have 
the parties share the current machine state, such that by the end of the fcth 
iteration the party with the automaton knows a random string r k whereas the 
party with the text learns q k + r k . The parties complete each iteration by run¬ 
ning an oblivious transfer in which the next state is now shared between them. 
The honest-but-curious setting significantly simplifies their construction. Un¬ 
fortunately, we cannot see any natural way to extend their technique into the 
malicious case (even using oblivious transfers resilient to malicious attacks). 

A high level description. We begin by briefly motivating our construction; see 
Fig.[3]as well. Loosely speaking, at the beginning of the protocol Pl and P 2 jointly 
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Pl(Q, {0,1}, A, qi,F) 

P 2 (ti ■ ■ •, te) 

1 7rKEY 1 — 0 pk,sk 2 ) 

(pk, ski) <— 


{ c i,i = E pk (A(j, i))}^ 


{E pk (f)} feF 


— ZKPOK of A,F |-> 

for £ € {1 

let C5-1 = E pk (A(l, (t 1 ,..., t$- 1))) 

A C — (cj-i/j'lj A C = {cj : t^}j 

A 7r is a permutation 


AC),n(C') 

choose 

permutation n' 

ir'(n(C)),n'(n (C")) 

masking permuted C <-> 



threshold decryption _ . . . r 

+-> find index ol next state 

ot masked vector 


C£ = E pk (A(l,ti,... ,tz)) 

end for 

<— | ZK of validity | <— 

verify ce is an 
accepting state 



Fig. 3. A high-level diagram of 7t A uto 


generate a public-key (G, E, D) for the threshold El Gamal encryption scheme 
(denoted by the sub-protocol 7r KEY ). Next, party Pi encrypts its transition table 
A and the set of accepting states F, and sends it to Pi- This allows Pi to 
find the encryption of the next state Ci = Z\(l,fi), by selecting it from the 
encrypted matrix. Pi re-randomizes this encryption and shows it to P\. The 
protocol continues in this fashion for the whole text with i iterations H 

2 Unfortunately, these iterations are not independent. Each requires an encryption of 
the current state, and thus cannot be performed in parallel. We show in Sect. [4] how 
to minimize the number of rounds into 0(\Q\) when performing a secure text search, 
which is typically quite small. 

















342 R. Gennaro, C. Hazay, and J.S. Sorensen 


At the beginning of each iteration, the parties know a randomized encryption 
of the current state, and their goal is to find an encryption of the next state. At 
iteration i, P 2 selects from the matrix the entire encrypted column of all possible 
| Q | next states for its input t* (only knowing an encryption of the current state). 
Then, using the homomorphic properties of El Gamal , the parties obliviously 
select the correct next state: Let c^_i denote an encryption of A( 1, ti,..., 

The parties compute first the set C = {c£_i/cy e }j, where {cj iC }j correspond to 
encryptions of the labels in Q ; see below for more details. Only one ciphertext 
in this set will be an encryption of 0, it indicates the position corresponding to 
the current state. The protocol concludes by the parties jointly checking if the 
encrypted state that is produced within the final iteration is in the encrypted 
list of accepting states. 

There are several technical challenges in constructing such a secure protocol. 
In particular the identification of the next encrypted state without leaking addi¬ 
tional information requires a couple of rounds of interaction between the parties in 
which they mask and permute the ciphertext vector containing all possible states, 
in order to “destroy any link” between their input and the next encrypted state. 
Moreover, in order to protect against malicious behavior, zero-knowledge proofs 
are included at each step to make sure parties behave according to the protocol 
specifications. We are now ready to present a formal description of our protocol. 

For a final remark we denote that due to technicalities that arise in the security 
proof, our protocol employs an unnatural masking technique, where instead of 
multiplying or adding a random value to each encryption, it uses both. The 
reason for this becomes clear in the proof. 

Protocol 2. 7 t AU to 

— Inputs: The input of Pi is a description of an automaton P = ( Q , {0,1}, A , 
qi,F), and the input of P 2 is a binary string t = t \,..., tg. 

— Auxiliary Inputs: \Q\ and \F\ for P 2 , £ for Pi, and the security parameter 
l n for both. 

— Conventions: We assume that the parties jointly agree on a group G of 
prime order q and a generator g for the threshold El Gamal encryption 
scheme. Both parties check every received ciphertext for validity, and abort 
if an invalid ciphertext is received. 

The transition table A is defined as A(j,i) which denotes the state that 
follows state j if the input letter is i. We augment this table by adding a 
column to A labeled by e: we define A(j, e) = j (the reader can think of it 
as the “label” of the jth row of the transition table). 

Since we are assuming a binary alphabet, unless written differently, 
j £ {1,..., |<51} and i £ {e,0,1}. Finally we assume that the initial state is 
labeled 1. 

— The protocol: 

1. El Gamal key setup: 

(a) Party Pi chooses a random value ri <— r Z 9 and sends party P 2 the 
value gi = g Tl . It proves knowledge of r\ using 7 t dl . 
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(b) Party P 2 chooses a random value r 2 <— R Z q and sends party Pi the 
value g 2 = g r2 ■ It proves knowledge of r 2 using 7 t dl . The parties set 
pk = (G, q, g, h = g± ■ 52 ) (i-e., the secret key is (ri + r 2 ) mod q). 

2. Encrypting Pi transition table and accepting states: 

(a) Pi encrypts its (augmented) transition table A under pk component¬ 
wise; Z\e = {cj.i = E pk (A(j,i))} .. Notice that Cj i€ is an encryption 
of the state j. Pi also sends the list of encrypted accepting states 
denoted by E (F) = {E pk (f)} feF . 

(b) For every encryption ( 01 , 02 ) £ Ae U E(F), Pi proves the knowledge 
of log g ci using 7 t dl . 

(c) Proving the validity of the encrypted transition matrix. Pi proves that 
Ae is a set of encryptions for values from the set {1,..., |Q|}. It first 
sorts the encryptions according to their encrypted values, denoted by 
Ci,..., C 3 - 1 Q| - P\ multiplies every encryption in this set with a ran¬ 
dom encryption of 0, sends it to P 2 and proves: firstly, that this vec¬ 
tor is a permutation of Ae, using 7 t perm , further that Cj = Cj/cj_i £ 
{E p k(0), E p k(l)} by proving that either ( pk,Ci ) or (pk,Ci/E pk ( 1)) is 
a Diffie-Hellman tuple, and, finally, that Yhci = E pk (\Q\). Pi also 
decrypts c 3 .|q|, which is always an encryption of \Q\. 

3. First iteration: 

(a) P 2 chooses the encryption of the next state = E pk (A(l,ti)). It 
then defines ci = ci j t 1 -GE pk { 0 ), i.e. a random encryption of the next 
state and sends it to Pi. 

(b) P 2 proves that D sk {d) € {P s fc(ci,o), P s fc(ci,i)} using the zero- 
knowledge proof 7 t enc for m = 1. 

4. Iterations {2,... ,£}: for every £ £ {2, let c^_i denotes the en¬ 

cryption of Z\(l, (< 1 ,..., t,£- 1 )); the parties continue as follows: 

(a) Subtracting the current state from the state labels in A: 

The parties compute the vector of encryptions C = {c£_i/cy e } for 
all j. Note that only one ciphertext will denote an encryption of 0, 
and that indicates the position corresponding to the current state. 

(b) P 2 permutes C and column tc: 

• P 2 computes C' = -a E pk ( 0)} for all j (note that C' corre¬ 
sponds to column t^ in the transition matrix - i.e. the encryptions 
of all the possible next states given input bit t{) and sends C' to 
Pi. It also proves that C' were computed correctly using 7 t enc . 

• P 2 chooses a random permutation 7 r over { 1 ,..., |Q|} and sends 
Pi a randomized version of {-rr(C), 7 r(C")} That is, the ciphertexts 
are permuted and randomized by multiplication with E pk { 0). 
The parties engage in zero-knowledge proof 7 t perm where P 2 
proves that it computed this step correctly. 
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Pi permutes n(C) and column t g: Let C%,C' n denote the per¬ 
muted columns that P 2 sent. If Pl is accepts the proof 7t perm it con¬ 
tinues similarly by permuting and randomizing C%, C'\ using a new 
random permutation 7r'. Pl proves its computations using 7 t perm . 
Multiplicative masking: Let Cl. C'l denote the permuted 
columns from the previous step. C * corresponds to the permuted e 
column from the transition matrix, which contains the labels of the 
states minus the label of the current state. The parties take turns in 
masking C l as follows: for every ( Cy 0 , Cjjj) G C^, P> chooses x <— r Z 9 
and computes c'=(cj 0 , Cj b ). It then proves that (G, Cy a , cj a , Cj tb , Cj b ) 
is a Diffie-Hellman tuple using 7 t ddh . (Encryptions of zero will be un¬ 
affected by this step, while non-zero values will be mapped to random 
values.) Pi repeats this step and masks the result. Let Ci, C 2 be the 
resulting masked columns for Pi and P 2 respectively. 

Additive masking: Ci corresponds to the permuted e column from 
the transition matrix, which has by now been masked by both par¬ 
ties. P 2 chooses |Q | random values ^tf 2 ,..., g\Q\ and encrypts them; 
if 2 = E p k(n^ 2 ). P 2 also computes cf 2 = (5j -g fff 2 ) -g Ppfc(O) for 
every c* G C 2 , and proves in zero-knowledge that the masking is 
correct by proving that for every i 

Cj,a Cj,b 

Ci,a ■ 7 i,a ’ ~d, b ■ 7 i,b 

is an encryption of zero, using 7t ddh . The ciphertext c, that denotes 
an encryption of zero is now mapped to the ciphertext Cj that con¬ 
tains an encryption of /x,;. The others are mapped to random values. 
Let C 2 = [ci,... ,C|Q|] denote this masked vector. Pl performs this 
step as well to obtain Ci. 

(f) Decrypting column e: The parties decrypt it using their shared 
knowledge of sk. In particular, for every Cj G C in which Cj = 
(c ii0 , Cj,;,), Pi computes c) = c r ^ a and proves that (G, g, g ri , c)-, Cy a ) is 
a Diffie-Hellman tuple. Next P 2 computes c" = c r2 a and proves that 
(G, g, g 1 ' 2 , c", Cj,a) is a Diffie-Hellman tuple. The parties decrypt Cj 
by computing D s k(cj) = Cyb/(c' • c"). Each party Pi sends its addi¬ 
tive shares; /xf% ..., g^Q\, and proves their correctness via 7t ddh . The 

parties chooses the index j for which there exists D s k(cj) = g^+gf 2 
(with high probability there will be only one such index). 

5. Checking the output: After the £th iteration ct contains the encryp¬ 
tion of A(l,t). To check if this is an accepting state without revealing 
any other information (in particular which state it is) the parties do the 
following: 

(a) They compute the ciphertext vector Cf = { c t/°} c fe{f)- Notice that 
A(l,t) is accepting if and only if one of these ciphertexts is an en¬ 
cryption of 0. 


(c) 

(d) 
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(b) P 2 masks the ciphertexts as in Steps 0d] and 0e] Let C' F be the 
resulting vector. 

(c) P 2 randomizes and permutes C' F . It also proves correctness using 
7 t perm . Let C F be the resulting vector. 

(d) The parties decrypt all the ciphertexts in C F with the result going 
only to Pi; for every ciphertext c = (c a , Cb) € C F , the party P 2 sends 
c' a = P a 2 and proves that (G, g, g r2 , c a , c' a ) is a Diffie-Hellman tuple. 
This information allows Pl to decrypt the ciphertexts, and Pi accepts 
if one of the decryptions equals one of the additive mask of P 2 . 

Before turning to the security proof we show that if both parties are honest then 
Pi outputs r(t) with probability negligibly close to 1. This is because with each 
iteration £ the parties agree upon the correct encrypted state with probability 
close to 1. We continue with the following claim, 

Theorem 3. Assume that 7t dl , 7 t ddh , 7 t enc and 7t perm are as described above 
and that (G, E, D) is the semantically secure El Gamal encryption scheme. Then 
7r AUTO securely computes P A uto in the presence of malicious adversaries. 

Intuitively it should be clear that the automaton and the text remain secret, 
if the encryption scheme is secure. However, a formal proof of Theorem [3] is 
actually quite involved. Consider, for example, the case in which Pi is corrupted 
and we need to simulate P 2 . The simulator is going to choose a random input 
and run P 2 ’s code on it. Then, to prove that this view is indistinguishable, we 
need a reduction to the encryption scheme. A straightforward reduction does not 
work for the following reason: in order for the simulator to finish the execution 
correctly it must “know” the current state, at every iteration i: but when we do 
a reduction to El Gamal we need to plug in a ciphertext for the current state for 
which we do not know a decryption, and this prevents us from going forward to 
iteration * + 1. A non-trivial solution to this problem is to prove that the real 
and simulated views are indistinguishable via a sequence of hybrid games, in 
which indistinguishable changes are introduced to the way the simulator works, 
but still allowing it to finish the simulated execution. 

As for the case that P 2 is corrupted, the proof is rather simple mainly because 
P 2 does not receive an output. Specifically, the simulator extracts P 2 ’s input in 
every iteration using the extractor for 7t enc . 

Efficiency. We present an analysis of our protocol and compare its efficiency to 
the generic protocols of mm for secure two-party computation in the presence 
of malicious adversaries. We introduced the efhcency of the generic protocols in 
Sec. CD A circuit that computes P AUTO would require 0(lQ log Q) gates. While 
there exists a sequential circuit of size 0(£Q) that computes a Q-state automaton 
over a binary input of size t, the generic protocol applies only to combinatorial 
circuits. So, our protocol improves the computational complexity over the generic 
solution by a factor of nlogQ or logQ operations compared to [2] and [TO 
respectively!! In comparison to urn we have the following: 

3 Although not presented here, our protocol generalizes to strings over a larger alphabet, 
and this provides another log E factor improvement where E is the size of the alphabet. 
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1. Rounds of Communication: Our protocol runs 0(£) rounds where £ is the 
length of the text, compared to the constant round complexity of mm- 
This is due to the fact that the parties cannot initiate a new iteration before 
completing the previous one. By applying known techniques, the round com¬ 
plexity can be reduced to 0(m) (i.e., the length of the pattern) by breaking 
the text into blocks of size 2?n; see Sect. [I] for more details. We note that 
the length of the pattern is typically very small, usually up to a constant. 
An additional improvement can be achieved if some leakage is allowed. In 
particular, since the number of rounds are determined by the length of the 
pattern, the pattern can be broken into smaller blocks and the output com¬ 
bined out of these results. This means that additional information about the 
text is released, as it is then possible to identify appearances in the text that 
correspond to substrings of the input pattern. 

2. Asymmetric computations: The overall number of exponentiations in proto¬ 
col ttValidauto, including the zero-knowledge proofs is 0{m ■ t). As for [H] , 
the number of such computations depends on the number of oblivious trans¬ 
fer executions, which is bounded by max(4£, 8s) where and the number of 
commitments 2£s(s+1), where s must be large enough so that 2 * 2~ is suf¬ 
ficiently small. Finally, [TB] requires 0(|C|) such operations. However, after 
carefully examining these costs, it seems that the parties compute approx¬ 
imately 720 RSA exponentiations per gate, where the number of gates is 
0(m ■ £), which is clearly not practical. Furthermore, the protocol of [TB: 
also requires a security parameter, usually of size of at least 1024, due to the 
strong RSA assumption. Consequently, all the public-key operations are per¬ 
formed over groups modulus this value (or higher, such as TV 2 ). In contrast, 
our protocol uses the El Gamal encryption scheme which can be implemented 
over elliptic curve group, typically, using only 160 bit keys. 

3. Symmetric operations: [14] , due to the use of a symmetric encryption scheme, 
also requires 0(s|C| + s 2 • n) such symmetric computations. 

4. Bandwidth: In our protocol, the parties send each other 0{m-£) encryptions. 
The bandwidth of the protocol in [16] is similar (again, with relatively high 
constants), whereas in [HQ, the bandwidth is dominated by the 0(s|C'| + s 2 n) 
symmetric-key encryptions. In [15], it is shown that communication is the 
main bottleneck when implementing the malicious protocol of la¬ 
in order to conclude the comparison, we note that implementing a circuit that 
solves the basic pattern matching problem may be significantly harder than 
implementing our protocol. 

3.4 A Zero-Knowledge Proof of Knowledge for a KMP Automaton 

Space does not permit a full treatment of our protocol for 7t V alidauto> but we 
we include a description of the protocol here: 
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Protocol 3. 7r VALIDAUTO 

— Auxiliary input for the prover: A collection {Qij, r i,j}i,j of Q sets, 
each set is of size 2, such that Cjj = E p k(Qij\rij) for all i £ {0,1} and 

3 e {!)•■•, |Q|}- 

— Auxiliary inputs for both: A prime p such that p — 1 = 2q for a prime 
q, the description of a group G of order q for which the DDH assumption 
holds, and a generator g of G. 

The protocol: 

1. For every Cjj = , Pi,j ), the prover P proves the knowledge of log 9 ab¬ 

using 7T dl . 

2. For every row Aj = {cyo, 0, 1 ) 7=1 in the transition matrix P 
proves the following: 

(a) It first randomly permutes Cyo and Cj,i and employs 7 t perm to prove 
its computations. 

(b) It proves that there exists b £ {0,1} in which Cj t b = E p k(j + 1) by 
proving that ( pk , Cj t b/E pk (j + 1)) is a Diffie-Hellman tuple. 

(c) P proves the correctness of cyi_b in two steps: (i) It first proves 
that it corresponds to a valid prefix of p, (ii) then it proves the 
maximality of this prefix ( p denotes the rth length prefix pi,... ,p r 
of p). 

i. Let |Q| — 1 = m, then the verifier V chooses to random elements 
u a <— r Z g and sends {u a } a to P. Next, both parties use the 
homomorphic properties of the encryption scheme to compute an 
encryption v a ^ a " = E p kC^2^ =a , Uk'Pk ) for all a', a" £ {1,..., to} 
with a' < a”. 

ii. Proving the existence of a prefix that matches a suffix of 

P/j-i ): For all 1 < k < j — 1 the parties compute an encryption 

v k = {vj-k,j-i/vi,k) ■ {cj,i- b /k). 

P then proves that there exists k for which v' k is a Diffie- 
Hellman tuple. 

iii. Proving that P(D sk (c ib )) corresponds to the longest suffix 

oi P{j- 1> : 

Next P proves that there does not exist an index D s / s (cj ) i_b) < 
k < j — 1 in which Vj-kj-i/v\^ = 0 yet Cj^-b/k ^ 0, as this 
would imply that p( k ) is a larger suffix of P(j-i) that matches 
however, D sk (c h i - b ) yf k. 

• Therefore, for every 1 < k < j — 2 and for every 2 < k! < j — 1 
the parties compute an encryption ek,k' = v ' k '{ v j-k',j- i/r’i.fc') 
for which P then proves that e k ,k' is not an encryption of zero 
using 7r NZ . 

3. If all the proofs are successfully completed, V outputs 1. Otherwise it 
outputs 0. 

We refer the reader to the extended version of this paper for a detailed definition 
and proof. 
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4 Text Search Protocol with Simulation Based Security 

In this section we present our complete and main construction for securely eval¬ 
uating the pattern matching functionality Pp M defined by 



Recall that our construction is presented in the malicious setting with full sim- 
ulatability and is modular in the sub-protocols 7r AUTO and 7t V alidauto- Having 
described the sub-protocols incorporated in the our scheme we are now ready 
to describe it formally. Our protocol is comprised out of two main phases: first 
the parties first engage in an execution of 7r VALIDAUTO for which Pi proves that it 
indeed sent a valid KMP automaton, followed by an execution of 7r AUTO which 
in an evaluation of P on P 2 ’s private input. 

In order to reduce the round complexity of the protocol, long texts are parti¬ 
tioned into 2m pieces and are handled separately so that the KMP algorithm is 
employed on each block independently (thus all these executions can be run in 
parallel). That is, let T = t\, ..., te then the text is partitioned into the blocks 
(fi,..., t 2 m),(tm, • • •, t 3 m),(t 2 m, ■ ■ ■, < 4 m) and so on, such that every two consec¬ 
utive blocks overlap in m bits. This ensures that all the matches will be found. 
Therefore, the total number of blocks is t/m. 

Protocol 4. 7 t pm 

— Inputs: The input of Pi is a binary pattern p = pi,... ,p m , and the input 
of P 2 is a binary string T = ti ,..., tn- 

— Auxiliary Inputs: the security parameter l n , and the input sizes l and m. 

The protocol: 

1. Pl constructs its automaton P = ( Q , E, A , q±, F) according to the KMP 
specifications based on its input p and sends P 2 encryptions of the tran¬ 
sition matrix A and the accepting states P, denoted by E z 1 and Ep- 

2. The parties engage in an execution of the zero-knowledge proof 
ttvalidauto where Pl proves that P was constructed correctly. That is, 
Pl proves that the set P^i corresponds to a valid KMP automaton and a 
well defined input pattern of length to. If P 2 S output from this execution 
is 1 the parties continue to the next step. Otherwise P 2 aborts. 

3. P 2 sends an encryption of T to Pl and the parties partition T into 
Ifm blocks of length 2 to in which every two consecutive blocks overlap 
in to bits. 

4. The parties engage in t/m parallel executions of 7r AUTO on these blocks 0 
For every 1 < i < l/m, let {output-}™}} 1 denotes the set of Pi’s outputs 
from the 'itli execution. Then Pl returns {j \ output- = “accept"}. 

4 The parties run a slightly modified version of 7t AU to where they carry out Step [5] for 
verifying acceptance m + 1 times, as each block contains potentially m+1 matches. 
This step can be executed in parallel for all block locations. 
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Theorem 4. Assume that 7r AUTO and 7t V alidauto ore as described above and 
that (G, E , D) is the semantically secure El Gamal encryption scheme. Then 
7t pm securely computes T PM in the presence of malicious adversaries. 

The security proof for 7 t pm is a combination of the proofs described for for 7r AUTO 
and 7r VALIDAUTO and is therefore omitted here. 

Efficiency. Since the costs are dominated by the costs of 7r AUTO , we refer the 
reader to the detailed analysis presented in Sect. 13.31 The overall costs are 
amount to 0(m • £ + m 2 ) = 0(m ■ £) since in most cases m « £. 

5 Conclusion 

Our protocols for oblivious automata evaluation and pattern matching operate 
in the standard model and require no CRS nor apply a cut-and-choose strategy. 
Having Pi,p 2 hold strings of lengths £,m for the text and the pattern respec¬ 
tively, both protocols incur communication and computation costs of 0(£ ■ m) 
which is even asymptotically better then the general construction for the oblivi¬ 
ous automata evaluation that requires a circuit with 0(1■ m log m) gates. Tabled 
summarizes and compares the computational and communications costs of each 
of these schemes. 


Table 2. Summary of results 



Round 

Complexity 

Communication 

Complexity 

Asymmetric 

Computations 

Symmetric 

Computations 

Lindell- 

constant 

0 (s|C| + s 2 m) times 

max(4 to, 8 s) 

0(s|C| + s 2 • m) 

Pinkas [14] 


128/256 bits 

OT’s 


Jarecki- 

constant 

0(1 ■ m) times 

720 exp. 

None 

Shmatikov |T5j 


2048 bits 

per gate 


Our 

0(m) 

0(1 ■ m) times 

0(1 ■ m) 

None 

Protocol 


160 bits 
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Abstract. Pairings on elliptic curves over finite fields are crucial for con¬ 
structing various cryptographic schemes. The r/r pairing on supersingular 
curves over GF(3 n ) is particularly popular since it is efficiently imple- 
mentable. Taking into account the Menezes-Okamoto-Vanstone (MOV) 
attack, the discrete logarithm problem (DLP) in GF(3 6n ) becomes a con¬ 
cern for the security of cryptosystems using r/T pairings in this case. In 
2006, Joux and Lercier proposed a new variant of the function field sieve 
in the medium prime case, named JL06-FFS. We have, however, not yet 
found any practical implementations on JL06-FFS over GF(3 6n ). There¬ 
fore, we first fulfill such an implementation and we successfully set a new 
record for solving the DLP in GF(3 6 ™), the DLP in GF(3 6 ' 71 ) of 676- 
bit size. In addition, we also compare JL06-FFS and an earlier version, 
named JL02-FFS, with practical experiments. Our results confirm that 
the former is several times faster than the latter under certain conditions. 

Keywords: function field sieve, discrete logarithm problem, pairing- 
based cryptosystems. 

1 Introduction 

Based on pairings, many novel cryptographic protocols have been successively 
constructed, such as identity-based encryptions [5], forward-secure cryptosys¬ 
tems, proxy cryptosystems, keyword searchable PKEs [7]. As a result, two re¬ 
quirements arose: efficient pairing computation and security parameter selection. 

The ?y t pairing [5] on supersingular curves over GF(3 n ) has been efficiently 
implemented both in software and hardware Along with the increase 

in computation speed on the rfr pairing, one may ask whether cryptosystems 

* This work was done when authors belonged to Future University Hakodate. 

1 Here, n is a prime number such as n — 97, 163 and 193 m- 
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based on the rjx pairing are still secure. It is well known that a discrete loga¬ 
rithm problem (DLP) on supersingular curves over GF(g) can be converted to 
a DLP in GF(g m ) (where q is a prime power and m is not larger than 6) |24j. 
Therefore, the DLP in GF(3 6n ) is one of the most important problems in analyz¬ 
ing the cryptosystems constructed with the r/x pairing on supersingular curves 
over GF(3 n ). 

The function field sieve (FFS) is the most efficient algorithm for solving the 
DLP in finite fields of small characteristic. The complexity of the FFS for solving 
the DLP in GF(3 6 ™) is L 3 en [1/3, c] with constant c, where 

£ 3 -[1/3, c] = exp((c + o(l))(log 3 6 ") 1 ^ 2 3 (log log 3 6n ) 2 / 3 ). 

Here o(l) stands for a function that converges to zero as n approaches infinity. 

The first FFS was proposed by Adleman [Tj in 1994. Five years later, Adle- 
man and Huang proposed an improved FFS (AH-FFS) with c = (32/9) 1//3 [2]. 
In 2002, Joux and Lercier proposed a practical improvement of the FFS (JL02- 
FFS) [J5]. Since a definition polynomial of the function field in JL02-FFS can 
select more flexibly, JL02-FFS is more practical than AH-FFS, though its asymp¬ 
totic complexity is the same as that of AH-FFS. Furthermore, by using JL02- 
FFS, Joux and Lercier succeeded in solving the DLP in GF(2 613 ). This refreshed 
the record for solving the DLP in finite fields of characteristic two with regard to 
bit size PD 2 ]. In 2006, Joux and Lercier proposed another new variant of the FFS 
(JL06-FFS) [IEJ. JL06-FFS has the same asymptotic complexity with JL02-FFS 
for solving the DLP in GF(3 6rl ), where n is a prime numbeo This work im¬ 
plied that JL06-FFS might be efficient for solving the DLP in extension fields of 
GF(3 6 ) of degree n. However, to our knowledge, there have been no practical ex¬ 
periments. Note that JL02-FFS can also be applied to extension fields of GF(3 6 ) 
of degree n, but [DE3 showed no advantage using GF(3 6 ) as the base field. 

Our contributions. We have first conducted experiments on JL06-FFS in 
GF(3 6rl ). In JL06-FFS, GF(3 6 ™) is constructed as extension fields of GF(3 6 ) 
of degree n, and thus the Galois action can be dealt for reducing required re¬ 
lations. By our implementation, we succeeded in solving the DLP in GF(3 6 ' 71 ) 
of 676-bit size with about 33 days computation, which is the new record for 
solving the DLP in GF(3 6 ™). Our work contributes to the selecting of security 
parameters. Additionally, we compared JL06-FFS [18] with JL02-FFS [16], and 
according to the experimental results, we confirmed that JL06-FFS is several 
times faster than JL02-FFS with n = 19, 61. 

The rest of the paper is organized as follows. In Section^ we briefly review the 
FFS algorithm. In Section |3l we compare JL02-FFS with JL06-FFS according 
to the polynomial selection method and experimental results. In Section |4] we 
describe our implementation on how to solve the DLP in GF(3 6 ' 71 ) in detail, 
which is based on JL06-FFS. Concluding remarks are made in Section [5] 


2 When n is a composite number, this variant may have complexity L 3 6» [1/3, 3 173 ] 

for solving the DLP in GF(3 6 ") (When JL06-FFS has complexity L q m [1/3, 3 173 ], we 

call it JL06-FFS-2). We do not deal with this case in this paper. 
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2 Outline of Function Field Sieve 

In this section, we describe an overview of the FFS [T], which consists of four 
steps: polynomial selection, collection of relations, linear algebra, and individual 
logarithm. We particularly deal with the FFS for solving the DLP in extension 
fields of GF(3 6 ) of degree n and describe the four steps below. For more details, 
refer to related work as |1I1211 fill 8| . 

Throughout this paper, let 7 be a generator of the multiplicative group of 
GF(3 6 ") and a £ ( 7 }, then we try to find the smallest positive integer log 7 a 
such that 7 log T “ = a, which is called the discrete logarithm. 

1. Polynomial selection: Select f £ GF(3 6 )[a;] such that / is a monic irreducible 
polynomial of degree n, then GF(3 6 ") = GF(3 6 )[a;]/(/). Next, find a poly¬ 
nomial H(x, y) £ GF(3 6 )[:r, y\ satisfying the eight conditions proposed by 
Adleman [I]. Then there is a surjective homomorphism 



where to is in GF(3 6 )[x] such that H(x, to) = 0 (mod /). Here we select the 
smoothness bound B and define a rational factorbase Br and an algebraic 
factorbase B A as follows: 

Br = {p e GF(3 6 )[a"] | deg(p) < B, p is irreducible}, 

B a = {(p ,y- t) £ Div(GF(3 6 )fy, y\/(H)) \ p e B R: t = m (mod p)}, 

where Div(GF(3 6 )fy, y]/(H)) is the divisor group of GF(3 6 )fy, y\/(H) and 
(p, y — t) is a divisor generated by p and y — t. 

2. Collection of relations: For r, s £ GF(3 6 )fy] of degree not larger than B , find 
at least {#Br + #B A ) relatively prime pairs (r, s) such that 


rm + s = p“ l 

Pi&B R 


{ ry + s )= 


(i) 


(pj 1^3 ) 


Such a pair (r, s) is called a double smooth pair. For each (r, s), compute 


( 2 ) 


rm + s, 


(—r) d H(x, —s/r ). 


(3) 


Polynomial © is said to be H-smooth if it is factorized into irreducible 
polynomials of degree not larger than B , and then we have 


(- r) d H(x , -s/r) = ]J 


(pj >tj) 


(4) 
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where tj is uniquely determined by r, s and p j. Then the bj in Equation © 
is exactly the same as the one in Equation ©• When both Polynomials © 
and © are B- smooth, a pair (r, s) is a double smooth pair. Eventually, we 
obtain the following relation: 

Y at log 7 p i= Y b o lo §7 K j ( mod ( 36n “ 1 )/( 36 “ 1 ))- ( 5 ) 

Vi£B R (p 


where 

= $^i) l/h , (A?) = h (Pj V - tj), (6) 

for the class number h of the quotient field of GF(3 6 )(x)[y]/(iJ). 

3. Linear algebra: For the number R of relations, construct ani?x (#Br + #Ba) 
matrix M from the relations in Equation © and (#Br + #Ba) dimensional 
column vector v as follows: 


/ a[ 


(i) 


M = 


x (1) 




\J R) a {R) -b (R) - b {R) 

\ a l ■■■ a #B R °\ ■■■ °#B A ) 


( log 7 pi \ 

log 7 p #Si5 
log 7 K 1 


\log 7 K#B A J 


Then we solve the linear equation 

Mv = 0 (mod (3 6ti — 1)/(3 6 — 1)). 


(7) 


4. Individual logarithm: Find integers e,;, fj such that 

log 7 a = Y e i log 7 pi + Y fi lo S 7 K i ( mod ( 36n “ 1 )/( 36 _ !))’ 

Pi&B R (p j,tj)£B A 

then compute the discrete logarithm log 7 a. This is done using the special-q 
descent method mm- 


3 Comparison of Polynomial Selection on JL02-FFS and 
JL06-FFS 

The two most efficient variants of the FFS for solving the DLP in GF(3 6ra ) are 
JL02-FFS and JL06-FFS. Although they have the same asymptotic complexity, 
there is a considerable difference between them in the fixed extension degree for 
practical use. The time complexities of JL02-FFS and JL06-FFS depend on the 
size of each sieving area, which is the number of pairs (r, s), and each size is 
explained in the following subsections. Note that our comparison is done merely 
by the size of the sieving area, and the detailed analysis should incorporate the 
non-integer smoothness bound estimated by Granger m- 
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3.1 Polynomial Selection of JL02-FFS and Its Sieving Area 

At first we describe an outline of the polynomial selection of JL02-FFS, after that 
we estimate the size of the sieving area. In order to distinguish from previous 
section, we set the subindex “02” after the symbols. 

Let Hq 2 (x, y) of degree do 2 in y be formed as C a b curves [25] : 

H 02 {x, y) = h a ,oy a + h 0 ,bx b + ^ Kjy l x 0 (hij G GF(3), h a , o, h 0 , b ^ 0). 

ib-\-ja<ab 

Randomly choose polynomials u\, u 2 G GF(3)[x] of degree at most \6n/do 2 \, 
and try to find an irreducible polynomial fo 2 = u 2 02 Hq 2 (x, —u\/u 2 ) G GF(3)[x] 
of degree 6n such that gcd(rt 2 , / 02 ) = 1, then u 2 is invertible modulo / 02 ■ Then, 
there is a surjective homomorphism 

. f GF(3)[o:,y]/(ffo2) - GF(3 to ) - GF(3)[x]/(/ 02 ) 

^° 2 ■ 1 y - 

where Hq 2 (x, y) holds Hq 2 (x, —u\/u 2 ) = 0 (mod / 02 ). In this polynomial se¬ 
lection, we need to modify Polynomial ([2]) to su 2 — ru\. Note that r and s are 
chosen in GF(3)[a;] of degree not larger than B 02 in JL02-FFS, the size of the 
sieving area in the collection of relation step is 

3-B02 + I . 2^02 + 1 

From heuristic analysis in [15] , JL02-FFS becomes optimized when we choose 
the smoothness bound Bq 2 as 

B 0 2 = r(4/9) 1/3 (6n) 1/3 log 3 (6n) 2/3 J. (9) 

and the extension degree do 2 of H 02 {x, y) as do 2 = [ \j6n/(B 02 + 1)J. For exam¬ 
ple, for n = 97, 163, 193, we have (n, B 02 ) = (97, 21), (163, 26), (193, 28). 

3.2 Polynomial Selection of JL06-FFS and Its Sieving Area 

Next we describe an outline of the polynomial selection of JL06-FFS and estimate 
the size of the sieving area of JL06-FFS. 

For each extension degree n of GF(3 6 ), we choose the smallest smoothness 
bound Bog in JL06-FFS satisfying the following condition, 

(Bqg + 1) log(3 6 ) > y/n/B 06 log (n/B 06 ) (10) 

For example, for n = 97, 163, 193, we have (n, B 0 o) — (97, 3), (163, 4), (193, 4). 
Next, we choose positive integers d and d! such that d « y/n/B qg and d' « 
y/nBo6, where dd! > n. After that, we randomly generate g(y) G GF(3 6 )[y] 
of degree d and set H(x, y) = g(y) + x. Finally, we try to find an irreducible 
polynomial / in GF(3 6 )[a:] of degree n, which divides H(x, m), where m G 
GF(3 6 )[x] of degree d' is chosen randomly. In this polynomial selection, each of 
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the leading coefficients of Polynomials © and © depends on r, so we avoid 
obtaining duplicate relations by fixing the leading coefficient of r as a monic 
polynomial. Therefore, the size of the sieving area in the collection of relations 
step is at most 

( 3 6)Bo 8 + 1 . ( 3 6)-B 06 _ (n) 


3.3 Comparison of Sieving Area 

We compare JL06-FFS with JL02-FFS with respect to the size of the sieving area 
in the collection of relations step in three classes of extension degree n: exper¬ 
imental class as {19, 31, 47, 61}, medium-security class as {97, 163, 193}, and 
high-security class as {239, 313, 353, 509}. Table [3] lists the smoothness bound 
and size of the sieving area in each variant. For each n, we obtain the smoothness 
bound Bq -2 in Equation © and Bqq in Equation (ll()[) . and estimate the size of 
the sieving area by Form © in JL02-FFS and by Form (fill) in JL06-FFS. 


Table 1. Parameters and sieving area 




Polynomial selection 

Polynomial selection 


n 


in JL02-FFS 


in 

JL06-FFS 


6 n 

B 02 

Size of 

n 

b 06 

Size of 






sieving area 



sieving area 


19 

114 

10 

3.1 x 10 1U 

19 

1 

3.9 x 10 s 

Experimental 

31 

186 

12 

2.5 x 10 12 

31 

2 

2.1 x 10 14 

class 

47 

282 

15 

1.9 x 10 15 

47 

2 

2.1 x 10 14 


61 

366 

17 

1.5 x 10 17 

61 

2 

2.1 x 10 14 

Medium- 

97 

582 

21 

9.8 x 10 2U 

97 

3 

1.1 x 10 2U 

security 

163 

978 

26 

5.8 x 10 25 

163 

4 

5.8 x 10 25 

class 

193 

1158 

28 

4.7 x 10 27 

193 

4 

5.8 x 10 25 


239 

1434 

30 

3.8 x 10 29 

239 

4 

5.8 x 10 25 

High-security 

class 

313 

1878 

34 

2.5 x 10 33 

313 

5 

3.1 x 10 31 

353 

2118 

36 

2.0 x 10 35 

353 

5 

3.1 x 10 31 

509 

3054 

42 

1.1 x 10 41 

509 

6 

1.6 x 10 37 


Figure [l] shows the size of the required sieving area over GF(3 6ti ). The sieving 
area in JL06-FFS is much smaller than that in JL02-FFS when n ^ 31,163. 
Moreover, the differences between the sieving areas in JL06-FFS and in JL02- 
FFS increase along with the increase in n. The computational cost in the col¬ 
lection of relations step is closely related to the size of the sieving area, so the 
collection of relations step in JL06-FFS might be several times faster than that 
in JL02-FFS. 

We have conducted experiments on the collection of relations step in JL02- 
FFS and JL06-FFS to confirm the difference between their computational costs 
of that step. Parameters in JL02-FFS and JL06-FFS are listed in Table [5] The 
curves that we used in our experiments are superelliptic ones, but not C a b curves 
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Fig. 1 . Size of sieving area over GF(3 6n ) in JL02-FFS and JL06-FFS 
Table 2. Parameters in our experiments 


Bit size of 
GF(3 6n ) 


19 181 

31 295 

47 447 

61 581 


Experiments with 
JL02-FFS 


Experiments with 
JL06-FFS 


6 n B 0 2 H 02 (x, y) n B 06 H(x, y) 


114 10 y A +x 

186 12 y*+x 

282 15 y A +x 

366 17 i/ 5 +x 


19 1 y b + x 

31 2 y* + x 

47 2 y 5 +x 

61 2 y 6 + x 


as [15]. Note that we have only experimented with the experimental class as 
n £ {19, 31, 47, 61}, not with medium and high-security classes. 

In our experiments, we used 96 cores, each of which had the same performance 
about Intel 2.83GHz Xeon. We implemented the lattice sieve [26] in JL02-FFS as 
piisirfi] . On the other hand, we implemented the polynomial sieve (l0| in JL06- 
FFS, since we fixed r as a monic poynomial in the collection of relations step 
and so the lattice sieve might not be efficient. The details of our implementation 
in JL06-FFS are described in Section 0] 

Figure [2] shows the time complexity of JL02-FFS and JL06-FFS to com¬ 
pute the entire sieving area in the collection of relations step in GF(3 6n ) with 
n = 19, 31, 47, 61, respectively. Note that we estimated the time when the com¬ 
putation lasts over one hour. 
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Fig. 2. Estimated time taken to compute entire sieving area in the collection of relations 
step over GF(3 6n ) in JL02-FFS and JL06-FFS 


When n = 19, 61, our implementation on JL06-FFS is faster than that on 
JL02-FFS, and we confirm that JL06-FFS is more efficient than JL02-FFS for 
solving the DLP in GF(3 6n ). In particular, when n = 61, our implementation 
of JL06-FFS takes about 66 days for the collection of relations step, but our 
implementation of JL02-FFS takes about 165 days for the same step. Therefore, 
the former is 2.5 times faster than the latter. Accordingly, we expect that JL06- 
FFS will be efficient for solving the DLP in GF(3 6rl ) for larger n. 

4 Solving the DLP in GF(3 6 ' 71 ) 

In this section, we report that the DLP in GF(3 6 ' 71 ) of 676-bit size is solved by 
improving JL06-FFS. In our implementation, we deal with four practical improve¬ 
ments, polynomial sieve, free relation, Galois action, and parallel Lanczos method. 

Particularly, by using the polynomial H(x, y) = y e + x , we only need to find 
about 1/8 of the originally required relations in the collection of relations step. 
Furthermore, via the Galois action, the size of the matrix given by the relations 
is also decreased to 1/6 of the original. To the best of our knowledge, the 676-bit 
size is currently the record for solving the DLP in GF(3 6ra ). 

4.1 Collection of Relations 

In the collection of relations step, we collect many double smooth pairs (r, s). The 
simple idea for collecting them is factoring Polynomials © and © for all pairs 
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(r, s). This is not practical since we have to factor them about (3 6 ) B x (3 6 ) s+1 
times. In order to reduce the number of factorings, we use a sieving method. 
The idea of sieving is merely factoring Polynomials © and (© of the pair (r, s), 
which has a high probability of becoming a double smooth pair. Such a pair is 
called a candidate. 

The polynomial sieve [ ID] and the lattice sieve [2B] are well-known sieving 
algorithms. Although the lattice sieve has been implemented in some experiments 
of the FFS [112115116) . we implemented the polynomial sieve since r is fixed as a 
monic polynomial by the polynomial sieve in JL06-FFS, whereas neither r nor 
s is able to be fixed by the lattice sieve. 

Polynomial Sieve. We describe the polynomial sieve in Polynomial © , namely, 
rm + s. Notice that we can also sieve in Polynomial © with the same procedure. 
Moreover, we discuss the case where s is fixed and omit the details when r is 
fixed. By fixed s, we can lead r such that rm + s is divisible by p € Bn or its 
power, where the degree of p is not larger than B. Additionally, (rm + s) + kp 
with k £ GF(3 6 )[x] is also divisible by p. Hence, we can obtain all r of degree 
less than or equal to B such that rm + s is divisible by p. After computing such 
all r for each p, we can obtain the pair (r, s) such that rm + s is divisible by 
some p. If the summation of the degree of all p, which divide rm + s, reaches 
deg (rm + s), then rm + s has a high probability of becoming R-smooth and the 
pair (r, s) becomes a candidate. 

In this procedure, the most time-consuming work is to compute r + kp for 
all k £ GF(3 6 )[:e] whose degree is not larger than B. In characteristic two, Gor¬ 
don and McCurley proposed a method using binary gray codes [TO] to compute 
these r + kp. Using ternary gray codes, we can also compute them efficiently in 
characteristic three. 

In the polynomial sieve, we sieve with all powers of p whose degree is not 
larger than B. Since B is very small, such as 1 or 2 in our experiments, the 
power of p is only p 2 when deg(p) = 1. Such polynomials are exceptional since 
there are 3 6 monic irreducible polynomials of degree 1 in GF(3 6 )[a;]. In this way, 
we can obtain only candidates each of which generates a relation in Equation ([5]) 
(except that r and s are not relatively prime). Thus, we only check the greatest 
common divisor of r and s, but not the smoothness of Polynomials © and © 
using the R-smooth test [ID], 

Free Relation. By considering how a divisor (p) where p € Bn is factorized 
into divisors in GF(3 6 )[a:, y\/(H), namely, obtaining the following congruent 
expression that 


H{x,y) = \\{y-U) (mod p), 

i—l 

where d is the degree of H{x , y) on y , we can obtain a relation virtually for free, 
without the sieving procedure. We call such a relation a free relation. 

The number of free relations depends on the degree d of H(x, y) on y and the 
characteristic of the field treated in the FFS. In fact, there are about free 
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relations in many cases and, furthermore, they increase when the characteristic 
is small. For example, in the case of GF(3 6rl ) and H(x, y) = y 6 + x, there are 
about #. 6 , 4/2 free relations since y 6 + x is generally factored as (y — ti) 3 (y — f 2) 3 
modulo p. 

4.2 Linear Algebra 

In the linear algebra step, we solve the linear equation depending on the relations. 
Specifically, we construct a matrix from the relations and reduce it to a much 
smaller one using the Galois action. After that, we solve the reduced linear 
equation modulo (3 6n — 1) /(3 6 — 1), by applying the parallel Lanczos method 
described as [3j. In this section, we describe the Galois action and our ideas 
about parallel computation of the matrix operation. 

Galois Action. Here, we consider to reduce unknowns of linear equations, using 
the Galois action which was presented in [T 8 ]. 

Let M' be the matrix given by the relations, whose row M'^ means the i-tli 
relation and j- th column corresponds to the factorbase p j. In order to use 

the Galois action, we choose the polynomial / £ GF(3 6 )[:e] satisfying that all 
coefficients of / are in GF(3) and deg(/) = n, then we construct GF(3 6 ") as 
GF(3 6 )[x]/(/). Let cj) be the Frobenius power such that </>(£) = £ 3 ". As 0 fixes 
the element x in GF(3)[x]/(/), we also have (j)(x ) = x in GF(3 6 )[:e]/(/) by the 
assumption of /. However, for an element c £ GF(3 6 )\GF(3), 4> does not fix c 
in GF(3 6 ) #’]/(/) by the above assumption that n is coprime to 6 . The monic 
irreducible polynomial p j £ Bn of degree not larger than B , and we assume that 
B = 1 for convenience. In fact, pj = x + Cj where Cj £ GF(3 6 ) since B = 1, so 
we have 

4>(pj) = cj)(x + Cj) = x + 4>(cj) 

in GF(3 6 )[x]/(/). If Cj is not in GF(3), Cj # <^(cj) in GF(3 6 )[x]/(/). This fact 
implies that there are ordinarily many unknowns of linear equations, which can 
be rewritten by the other one via the Galois action. Clearly, for such pj, there 
exists p j’ satisfying that 

lo g 7 Pi' = lo S 7 HPj) = 3™ log 7 p 7 (12) 

where pj # pj/. Therefore, we can remove the j '-th column M'^ 1 and set the j- 
th column M ,<JI as M' 1 " 1 ' 1 +3"M'* J \ Then we denote the new matrix M* as the 
reduced M'. Notice that this technique is also used for the algebraic factorbase. 
Consequently, the number of unknowns is about 1/6 of the original; thus, the 
number of relations is reduced to about 1/6. In our implementation, we do not re¬ 
duce the factorbase in the sieving phase (the computation is the same as the case 
without the Galois action). After sieving, we compress obtained relations using 
rewritable elements of the factorbase via the Galois action as Equation (fT21) , and 
so the factorbase is reduced to about 1/6. Using this procedure, we almost do 
not lose the probability of obtaining the relation. Hence, this technique enables 
us to perform computations for the collection of relations step about 6 times as 
fast as before, and the linear algebra step can be also done about 6 2 times faster. 
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Parallel Lanczos method. The reduced matrix M* is reconstructed to opti¬ 
mize first, then we apply the parallel Lanczos method to it. Before explaining 
the reconstruction, we begin with the explanation of the parallel computation. 
Assume that there are four nodes written as Ni t i,Ni t 2 , A^i, -^ 2,2 and each node 
has 4 or 8 cores. As the Figure 3, we partition the reconstructed matrix M into 
four matrices Mjj, and each Mjj is allotted to node iVjj respectively. The given 
vector v is also partitioned into v\, V 2 , and Vj is given to nodes A}j, where 
i i'. Moreover, Mjj is partitioned into L matrices At when Njj has L cores. 





V Al ) 


Fig. 3. Partitioning M into four matrices Mjj and M,j into L matrices At 

We now give the notation of the Lanczos method. The Lanczos method can 
operate only a symmetric matrix; however, the given matrix M is usually non- 
symmetric. Therefore, we try to solve the linear equation of the form M T Mv = 

а, where v is an unknown column vector consisting of the logarithms of the 
factorbase and a is the given column vector. Note that computing M T M is not 
efficient, so we compute the vector u = Mv and M T u. For more details about 
this computation is in [22] , 

After partitioning M, we perform a parallel computation for u := Mv and 
w := M t u with Mjj. Let iq, tq, u 1 , and U 2 be the partitioned vectors such 
that v = v± © V 2 and u = ui © 1 x 2 . From Algorithm 1, we obtain the partitioned 
vector Wi such that w = Wi © w# in node JVjj, where i £ {1,2} and i' = 3 — i. 
The symbol j' also means that f = 3 — j for j £ {1,2}. 

Lines 4, 5, and 6 describe the computation of M T u. Note that in each node 
Nij, by regarding the column of Mj j as the row of Mjj, we do not have to trade 
Mjj with Mjj, namely, we can cut unnecessary operations. 

Algorithm 1. (Computation with node Njj.) 

Input : the partitioned matrix Mjj and the partitioned vector Vj. 

Output : the partitioned vector Wj such that wi © W 2 = M T Mv, where j is equal to 
1 or 2. 

[Step for computation of u := Mv] 

1 . mj := MjjVj. 

2. Give Ui,j to Node Nj ji and receive Ujj> from Njj/. 

3 . Uj := Uij + Ujji. 

[Step for computation of w := M T u] 

4. Wjj := MjjUj. 

5. Give Wi,j to Node and receive from Njij. 

б . Wj := Wij + Wj/’j. 
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We have discussed the parallel computations among nodes, and now we move 
on to the parallel computations among cores in one node. Here, A( denotes the 
partitioned matrix of Mij such that M,j = {&£ =1 Ae. From Algorithm 2, we can 
easily obtain AeVj, and then we set the new vector Uij = (AiVj,... ,AlVj) t , 
where L is the number of cores in the same node. Similarly, we can easily obtain 
Aju.i and compute Wjj = 1 Ajui by using Algorithm 3. 

Algorithm 2. (Parallel computation of MijVj among L cores in the same node.) 
Input : the partitioned matrix A := Mij whose size is s x t and the partitioned 
t -vector Vj. 

Output : the partitioned vector mj such that mj = Avj. 

1. Compute be := AeVj for l = 1 to l = L in parallel. 

2. Ui tj = ®fei b e . 

Algorithm 3. (Parallel computation of M'f.jU,, among L cores in the same node.) 
Input : the partitioned matrix A := M,., whose size is s x t and the partitioned 
s -vector u, . 

Output : the partitioned vector Wij such that Wij = A T Ui. 

1. Compute ce := Aju-i for l = 1 to i — L in parallel. 

2 - Wij = J2e=i c i- 

From the parallel computations of MijVj and so on, we obtain the vector 
M t Mv from Algorithm 1 and 2. Therefore, we need to reconstruct M so that 
each node has the balanced calculation amount of computing M l:l Vj and so 
on. It is clear that the calculation amount depends on the number of non-zero 
elements in the allotted matrix, and the distribution of non-zero elements in M 
is not uniformity. In fact, the number of non-zero elements in a column of M 
is not balanced, but that in a row is balanced. Thus, we reconstruct the new 
matrix M so that the number of non-zero elements in M\ \ and A/ 2.1 is almost 
equal to that in A/ 1.2 and A/ 2/2 by sorting columns of M* defined in the section 
of the Galois action. We perform a similar strategy as above for the parallel 
computation among cores in the same node, namely, A is partitioned into 4 or 
8 smaller matrices Ag so that each A( has almost the same number of non-zero 
elements. 

4.3 Computation Results 

In this section, we describe our computation results of the 676-bit DLP in 
GF(3 6 ' 71 ), which contains a multiplicative subgroup whose order is a 112-bit 
prime. We construct GF(3 6 ) as GF(3)[z]/(2 6 + 2z + 2) and define a mapping 
ip : Z —> GF(3 6 )[x], such that ip" 1 : z 1 —> 3 ,21 1 —> 3 6 , in order to represent the 
element in GF(3 6 )[cc]. 

In the polynomial selection step, we set H(x, y) = y 6 + x in order to use the 
Galois action. Moreover, we select m £ GF(3 6 )[x] such that all its coefficients 
are in GF(3) to construct / whose coefficients are also in GF(3). By an easy 
computation, we obtain proper m and / as follows, 
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m = ip (0x456bc 60e76cll le679735 c929fc55) 

/ = ( 0x9 2d3e5daf 5ac01130 4e6909f7 09cc8833 baa757d3 

17dc6f99 9c8b98b5 ab8baa01 d68ecl51 aec39e2e ed081c79 
d851066b 3ffb2a4f a3el9cle cef46675 0918a26d 9c7cacd4 
8d74ccfe 2cld3b79 e81e6138 ab06aef4). 

Then, GF(3 6 ") is constructed as GF(3 6 )[a"]/(/). When we set the smoothness 
bound B = 2, there are 266,085 elements in the rational factorbase and 265,721 
elements in the algebraic factorbase, so we need to collect at least 531,806 rela¬ 
tions. However, the size of the sieving area when B = 2 is too small to collect 
enough relations. 

We settle this problem by using the Galois action, since we can considerably 
reduce the number of required elements in the factorbase described in Section FOl 
In fact, we need only 88,674 relations, and so this number is about 1/6 the 
number of the originally required relations. 

Moreover, we deal with free relations which are obtained without sieving. If 
we choose H(x, y) as y 6 + x , then it is fortunately factored as (y — t±) 3 (y — t 2 ) 3 
(mod p) for most of elements p in the factorbase, and so there are 132,860 (« 
#I?a/ 2) free relations. Even if we delete many duplicates which are produced 
by using the Galois action, 22,155 free relations remain. Thus, we only have to 
find at least 66,519 relations in the collection of relations step, and this number 
is about 1/8 that of the originally required relations. 

In the collection of relations step, we use the polynomial sieve described in 
Section Krl and compute relations using five nodes, each consisting of Intel Quad- 
Core Xeon E5440 (2.83 GHz) x 2 CPUs with 16-GB RAM, one node consisting 
of Intel Quad-Core Xeon X5355 (2.66 GHz) x 2 CPUs with 16-GB RAM, and 
twelve nodes, each consisting of Intel Quad-Core Xeon L5420 (2.33 GHz) x 
1 CPU with 4-GB RAM, total of 96 cores. In 18 days of computation, after 
removing duplicates, we found 66,646 relations. Thus, we obtained a total of 
88,801 relations, which are enough to solve the linear equation in Equation 0- 

The linear equation constructed from the relations has to be solved modulo 
(3 6 ' 71 — 1)/(3 6 — 1); however, the Lanczos method may fail when the modulus 
has a small prime factor. Therefore, we work modulo the factor iV,: of (3 6 ' 71 — 
l)/(3 6 -l), 

Ni = (3 2 ' 71 + 3 71 + 1)/(13 • 5113), 

N 2 = (3 2 ' 71 - 3 71 + 1)/(7 • 210019 • 49682251 • 55126531), 

N 3 = (3 71 + l)/(2 2 • 853 • 2131 • 82219), 

N 4 = (3 71 - l)/2. 

where every prime factor of N t is larger than 30 bits and A/ is relatively prime 
to each other. 

We use a cluster with four nodes, each consisting of Intel Quad-Core Xeon 
E5440 (2.83 GHz) x 2 CPUs with 16-GB RAM, and three clusters with four 
nodes, each consisting of Intel Quad-Core Xeon L5420 (2.33 GHz) x 1 CPU with 
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4-GB RAM. With about 12 hours computation, we solve the linear equation 
modulo Ni via the parallel Lanczos method with the four nodes described in 
Section l4~2l on each cluster. With the Chinese remainder theorem and the Galois 
action of <f>, we solved discrete logarithms of the elements in the factorbase 
modulo N = -W»- 

In the individual logarithm step, our target of computing the logarithm is the 
element 

ir(x) = x 10 2O2 J) 

= (z 4 + z 3 + 2z 2 + l)x™ H-h (z 5 + 2z 4 + 2z 3 + z 2 + 2) 

in basis 7 = ^>(0x456). We choose the representation of 7 r(x) as a product of 
elements of degree at most 7 as follows: 

7 t 7 r(x) = z\/z 2 (mod /), where 
Z\ = ^>(0x333) X t/>( 0x345) x -i/:(0x427) x ^(0x436) x ?/;(0x4c3) 
X'i/)(0xd909 66c7e3ec) x ^>(0x293996d cc380672) 
xV’(0x3ff378e 3d4659d0) x ^(0x6 27d6c281 0a0fc5a2) 

X'i/)(0x8 f4797e29 a9ec3b4a), 

~2 = ^(0x318) X 0x45 4c6fbfd4) X i/»(0x54 c69e6f97) 

X , i/:(0xl686d 42782189) X '0(Ox3cf67a5 84055cd8) 

Xip(0x8 f68ab2e2 5d2bc04f) X ^)(0xb cc56922c f651b383), 
t = 0x2 Of 822e8c ac48792a e2aea337 c9002b49 bbf lb864 

43a6111b 24c5593d e44daf43 e26de26e If85f982 Iba485b3 
beda74bd f782626d 6cd38bb2 8f829867 5dc04adc f8741c24, 

and Z\ , Z 2 are 7-smooth. Then, we compute the logarithms of z± and Z 2 in basis 
7 using the special-q descent technique ttsnHi . With about 14 days computation 
using five nodes, each consisting of Intel Quad-Core Xeon E5440 (2.83 GHz) x 
2 CPUs with 16-GB RAM, and one node consisting of Intel Quad-Core Xeon 
X5355 (2.66 GHz) x 2 CPUs with 16-GB RAM, we compute the logarithms, 

log 7 z\ = 0x3 fc71c577 10be8e3f e7af0fba e00e711f 0ad6dd50 

38fb8f26 cOfadb3b 448cab2f 67671247 285f9e95 dc501717 
d9def844 a75f9e58 f04a9bd2 3a5d0fdb 8f8ebb9f fea4deea, 
log 7 Z 2 = 0x4 82febaec ae4382e0 e651f577 09df4e7d 99d99d34 

03db5d5e 521c4e2b da89ec33 6c9d45d6 2ddlf982 2fl98fb2 
6c069414 3b0bl544 ece8e4bl 5304872f 6ff261fd 03b271c7. 

modulo N, and so we obtain log 7 n(x) mod N. 

The logarithm in multiplicative subgroups of less than 30 bits are computed 
using the Pollard’s p method in a minute. Using the Pohlig-Hellman method, we 
compute the logarithm log 7 7r(x): 
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Table 3. Records for solving the DLP in finite fields 


Finite Fields 

GF(p) GF(2 n ) GF [p A ) GF(p au ) 

GF(3°”) 

Reference 

Date 

Algorithm 

121] US] ED] Dl] 

Feb. 5, 2007 Sep. 22, 2005 Aug. 23, 2006 Nov. 9, 2005 
NFS* .JL02-FFS JLSV06-NFS f JL06-FFS-2* 

This Work 
Dec. 9, 2009 
JL06-FFS 

Collection 

of 

Relations 

4 nodes of 16 Alpha 16 Alpha 

Many CPUs' 16 Itanium 2 processors processors 

(1.3GHz) (1.15GHz) (1.15GHz) 

Xeon 

(2.83GHz) 

96 cores 

Linear 

Algebra 

4 nodes of 16 Alpha 16 Alpha 

12-24 Xeon T . 

, „„ . 16 Itanium 2 processors processors 

(3.2CHz) ( 13GHz ) (1.15GHz) (1.15GHz) 

Xeon 

(2.83GHz) 

80 cores 

Timing 

Bit Size 

33 days 17 days 19 days 12 hours 

532 613 394 556 

33 days 

676 


*NFS: Number Field Sieve 191171 . ^JLSV06-NFS: NFS in the medium prime case |20| . 

*See footnote 2 on page 2. ^ There are no detailed descriptions of computational resources in m- 


log 7 tt(x) = 0x8 T8b54T9T 2fb6ff 9b 57add5d5 Ilf 69de6 a3853f98 

68d53cc0 5b531076 2872ac6a 320874bf ba6d66d6 8e5e245f 
39778f02 31ae791a acbab8c7 5ee6850c 9f5df0e5 f6b8ab0b 
95d8bdbl aea95blf bad82465 25590f66 

and completely solve the DLP in GF(3 6 ' 71 ) of 676-bit. 

4.4 For Larger Extension Degrees 

We have solved the DLP in GF(3 6n ) for n in the experimental class, where the 
smoothness bound B (i.e., Bqq) is less than or equal to 2 (ref. Tabled]). Note 
that the size of the sieving area increases (3 6 ) 2 -fold if the smoothness bound B 
increases by one (see Form (HU). However, we expect that, if we set B = 3, the 
DLP in GF(3 6 ' 97 ) might be computed for several years by using dozens of our 
computational resources through various techniques such as large prime varia¬ 
tion, block sieving and sieving via bucket sort |29I4| . and SIMD implementation. 

5 Concluding Remarks 

In this study, we implemented a new variant of the FFS in GF(3 6n ) (n is a 
prime), proposed by Joux and Lercier in 2006 |18] , and compared it with the 
earlier variant, which was also proposed by Joux and Lercier in 2002 [15] with 
practical experiments. In solving the DLP in GF(3 6n ), these two variants of the 
FFS have the same asymptotic complexity, but we expected the new variant 
to be more efficient than the earlier one in some extension degrees n. From 
our experimental results, we confirmed this forecast when the extension degree 
n = 19, 61. Moreover, with our implementations, we succeeded in solving the 
DLP in GF(3 6 ' 71 ) of 676-bit size with about 33 days computation. 












366 T. Hayashi et al. 


We have experimented with the DLP in GF(3 6n ) required for pairing-based 
cryptosystems. The security of pairing-based cryptosystems relies on the diffi¬ 
culty of the DLP in various finite fields, for example, GF(2 4 ™) and GF(p 12 ). 
Table [5] presents the current records for solving the DLP in various finite fields. 
All the DLPs used for pairing-based cryptosystems have not examined yet. It is 
an open problem to analyze the hardness of the DLP with practical key sizes in 
such finite fields. 
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Abstract. The Pollard kangaroo method solves the discrete logarithm 
problem (DLP) in an interval of size N with heuristic average case ex¬ 
pected running time approximately 2 y/N group operations. It is well- 
known that the Pollard rho method can be sped-up by using equivalence 
classes (such as orbits of points under an efficiently computed group 
homomorphism), but such ideas have not been used for the DLP in an 
interval. Indeed, it seems impossible to implement the standard kangaroo 
method with equivalence classes. 

The main result of the paper is to give an algorithm, building on work 
of Gaudry and Schost, to solve the DLP in an interval of size N with 
heuristic average case expected running time of close to 1.36y/N group 
operations for groups with fast inversion. In practice the algorithm is not 
quite this fast, due to the usual problems with pseudorandom walks such 
as fruitless cycles. In addition, we present experimental results. 

Keywords: discrete logarithm problem (DLP), elliptic curves, negation 
map, efficiently computable group homomorphisms. 

1 Introduction 

The discrete logarithm problem (DLP) in an interval is the problem: Given g, h 
in a group G and N £ Z>o such that h = g n for some 0 < n < N (where N is less 
than the order of g), to compute n. This problem arises naturally in a number 
of contexts, for example the DLP with c-bit exponents (c-DLSE) |15I23I21] , de¬ 
cryption in the Boneh-Goh-Nissim homomorphic encryption scheme [T], count¬ 
ing points on curves or abelian varieties over finite fields [T%] , the analysis of 
the strong Diffie-Hellman problem [3117] . and side-channel or small subgroup 
attacks mm- 

One can solve the DLP in an interval using the baby-step-giant-step algorithm 
in at worst 2 '/N group operations (or, with minor modifications, with average 
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case running time of y/2N group operations). But this method also requires 
0(y/N) group elements of storage. 

Pollard [24] developed the kangaroo algorithm precisely with this application 
in mind. Using distinguished points, van Oorschot and Wiener [22] (also see Pol¬ 
lard [25]) achieve a heuristic average case expected complexity of essentially 2\fN 
group operations and low storage. We summarise this algorithm in Appendix [A] 
Note that this algorithm has success probability of 1, as do all algorithms in this 
paper. These algorithms can also be parallelised (or distributed) with a linear 
speedup. For comparison, the Pollard rho method [21] has heuristic expected 
running time of yj 7rr/2 ss \.2h\fr operations if g has order r. All complexity 
statements in this paper rely on heuristic assumptions; for steps toward a rigorous 
analysis of the kangaroo method please see Montenegro and Tetali [T5]. 

Gaudry and Schost [H] (building on earlier work of Gaudry and Harley [13] ) 
presented a different approach to solve this problem using a birthday paradox 
style analysis. Whilst their algorithm is not as fast as that of van Oorschot and 
Wiener, it is easily parallelisable and importantly there is no requirement to know 
the number of clients or processors before the algorithm begins. Parallelising the 
Gaudry-Schost algorithm gives a linear speedup and this also applies to all the 
algorithms in this paper. For brevity we state all running times for the serial 
case. The average expected running time of their algorithm is 2.08 y/N group 
operations on a serial computer (the algorithm of Gaudry and Harley [T3] is less 
efficient). We present their algorithm and recall the analysis of its complexity in 
Section [2] 

Gallant, Lambert and Vanstone HH and Wiener and Zuccherato [29] showed 
that the Pollard rho method can be used with equivalence classes (orbits of group 
elements under an fast computable group homomorphism) to achieve a constant 
speedup in some groups. In particular, for elliptic curves the rho algorithm can 
be sped-up by a factor of y/2 using the equivalence class {it, it^ 1 } where it is 
a group element (which is more commonly written as {P, —P} in the case of 
elliptic curves). In practice, the running times are not so good, since the algo¬ 
rithms use pseudorandom walks which do not behave exactly like true random 
walks (in particular, walks can fall into short cycles and hence never arrive at 
a distinguished point; these are called “fruitless cycles” and have been analysed 
by Duursma, Gaudry and Morain [5] and Bos, Kleinjung and Lenstra [2]). 

It seems to be impossible to combine the standard kangaroo method with 
equivalence classes in general (Section 19.6.3 of [5] claims it can be done but 
gives no details, and this seems to be an error). Hence, it is necessary to consider 
other algorithms. A natural observation is that, for a DLP instance ( g , h) in an 
interval of even length N, one can set h! = hg ~ N / 2 and then solve h! = g n where 
—TV/2 < n < N/2. If the discrete logarithm of u lies in the interval [— iV/2, iV/2] 
then the equivalence class {it, 1 U 1 } does correspond to a pair of group elements 
in the region of interest. 

Very recently Pollard [25] developed two new variants of the kangaroo method 
which require inversion of just one or two group elements. Pollard’s three and 
four kangaroo variants have heuristic running times of roughly 1.82 y/N and 
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1.71 y/N group operations respectively. More details about these algorithms will 
appear in forthcoming joint work. 

In Sections |3] and |4] we show how to speed up the Gaudry-Schost method 
in groups with fast inversion (such as elliptic curves, tori, LUC and XTR). 
Here fast inversion means that computing u _1 for any u in the group is much 
faster than a general group operation. We also present a further speedup by 
modifying the search region. Our main result is a method to solve the DLP 
in an interval in approximately 1.36%/iV group operations. The result uses a 
new variant of the birthday paradox which is developed in |10j . The theoretical 
analysis of the algorithm assumes it is run using a truly random walk. In practice 
one implements the algorithm using a pseudorandom walk which has a number 
of undesirable consequences, in particular the existence of fruitless cycles. In 
Section [5] we present experimental results which give a better idea of the actual 
performance in practice (though it is likely that these figures can be improved). 

Our algorithm, as with Gaudry-Schost, requires low storage and can be par¬ 
allelised with linear speedup very easily. 

We indicate in Appendix [B] how to speed up the Gaudry-Schost algorithm for 
the multi-dimensional DLP using equivalence classes. A precise analysis of the 
algorithms in Appendix iBl is currently an open problem. 

2 The Gaudry-Schost Algorithm 

To introduce notation and the central ideas, we recall the Gaudry-Schost algo¬ 
rithm m- The basic idea is the same as the kangaroo algorithm of Pollard in 
the van Oorschot and Wiener f22| formulation. Let g and h be the DLP instance 
we wish to solve, with h = g n for some integer —TV/ 2 < n < TV/2. We run a 
large number of pseudorandom walks (possibly distributed over a large number 
of processors). Half the walks are “tame walks”, which means that every ele¬ 
ment in the walk is of the form g a where the integer a is known. The other half 
are “wild walks”, which means that every element is of the form hg a where the 
integer a is known. As is typical in this subject, we visualise the group in terms 
of the ‘exponent space’. More precisely, define the ‘tame set’ 

T = [—TV/2, N/2] 

(where by [TVi, TV 2 ] we mean {a £ Z : TVi < a < TV 2 }) and the ‘wild set’ 

W = n + T={n + a:a£ [-TV/2, TV/2]}. 

Although T and W are of the same size, W is a translation of T and of course 
we do not know the value of n. A tame walk is a sequence of points g a where 
a £ T and a wild walk is a sequence of points g b = hg a with b £ W. 

Each walk proceeds until a distinguished point is hit. This distinguished point 
is then stored on a server, together with the corresponding exponent a and a 
flag indicating which sort of walk it was. This data is analogous to the ‘trap’ set 
in the standard Pollard kangaroo method [24126] . When the same distinguished 
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point is visited by two different types of walk we have the “tame-wild collision” 
g ai = hg a2 and one solves the DLP. We stress that the algorithm continues until 
the DLP is solved. Hence the probability of success is 1. 

The main difference between the Gaudry-Schost algorithm and the kangaroo 
algorithm is that when a distinguished point is hit, Gaudry and Schost restart 
the walk from a random starting point in a certain range, whereas the kangaroos 
keep on running. The theoretical analysis is different too: Gaudry and Schost 
use a variant of the birthday paradox whereas Pollard and van Oorschot and 
Wiener use a different probabilistic argument (see Appendix © . 

2.1 Theoretical Analysis 

We now recall the precise analysis of the idealised version (i.e., using a truly 
random walk, rather than a pseudorandom walk) of the Gaudry-Schost algo¬ 
rithm |14j . Gaudry and Schost use the following variant of the birthday paradox, 
which we will call the Tame-Wild birthday paradox. 

Theorem 1. When sampling uniformly at random from a set of size R £ N, 
with replacement, and alternately recording the element selected in 2 different 
lists then the expected number of selections that need to be made in total before 
we have a coincidence between the lists is \/ttR + 0(1). 

Proofs of this theorem can be found in Selivanov [SHj or in [27] (which derives 
it from a result of Nishimura and Sibuya |20j). For simplicity we will omit the 
0(1) term from all subsequent running times. 

Since tame points lie in T and wild points lie in W a collision between tame 
and wild points can only occur in TtlW. We call such a collision ‘tame-wild’ and 
this is analogous to a coincidence between the lists in Theorem [l] so we apply 
Theorem Q] in the case R = |T fl W|. 



-N/2 0 N/2 N 


Fig. 1. Overlap between T and W. The sets T and W are represented by black hori¬ 
zontal bars and the shading between them shows the length of the overlap. The first 
case is when n = N /4 and the second case is n = N/2. 
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Figure Q] presents T fl W in two cases. The first case is h = g n for n = N /4 
so \T D W\ = 37V/4 (this is the ‘average case’). The second case is h = g n for 
n = N/2 so | T fl W\ = N/2 (which is the ‘worst case’). 

Theorem 2. (Gaudry-Schost \14V Let notation be as above. If elements are 
sampled uniformly at random with replacement alternately from T and W and 
recorded, the expectation, over all problem instances, of the number of selections 
before a tame-wild collision is 2.08 \fN . 

Proof. The running time of the Gaudry-Schost algorithm is dependent on the 
problem instance h = g n but, by symmetry, we can restrict to the case 0 < n < 
N/2. We write this as h = g xN where x £ [0,1/2]. 

Let R = \T fl W\ = N(1 — x). By Theorem Q] we expect to need to sample 
VttR elements (half of each type) of T fl W to find a collision. To select ^VttR 
elements in T fl W when sampling uniformly from T requires selecting 

j^| ±V^R= W n N/( 1-x). 

The same argument applies to W. Hence, the expected running time of the 
algorithm is \/ttN/(1 — x) group operations. Note that this is the expected value 
of the running time, over all choices for the random walk, for a specific problem 
instance. 

We now average this over all problem instances as 

/•V 2 r , 

2 I (1 - x)- 1/2 V^dx = 2 VnN |2 - \/2j = 2(2 - \f2)V^N « 2.08^. 


This result has been improved to 2.05 '/N by using smaller sets for T and W 
in 0. 

2.2 Pseudorandom Walks and Practical Considerations 

Gaudry and Schost present the result in Theorem [2] but they also consider the 
practical implementation of the algorithm. First, to reduce storage, one does 
not record every element sampled by the pseudorandom walk but instead uses 
distinguished points. If we let 9 be the probability that an element of the group 
is a distinguished point then walks are of length 1 /9 on average and we require 
storage of around 9\/N group elements. 

Second, it is necessary to use a pseudorandom walk which performs close 
enough to sampling uniformly at random that the Tame-Wild birthday paradox 
still applies. Gaudry and Schost, as with the kangaroo method, partition the 
group into, say, 32 subsets and use a pseudorandom walk where each step is 
a multiplication of the current group element by g ai , where is a fixed small 
positive integer, if the current group element lies in the *-th block of the partition. 
Our algorithms will necessarily have random walks which step in a “side-to-side” 
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manner, since the equivalence class representative of a group element could be 
its inverse and steps to the right from the inverse of a group element are the 
same as steps to the left from the original element. Hence, though we take cii £ N 
and each step is multiplication by g ai , in practice the walks look like the jumps 
are of lengths ±a,. We denote by m the mean of the integers cq and call it the 
mean absolute step size. For the analysis we recall the following result (note that 
the mean absolute step size in this walk is ^). 

Lemma 1. (Cofman, Flajolet, Flatto and Hofri m Let 2/o, j/i, • • ■ ,yk be a sym¬ 
metric random walk that starts at the origin (yo = 0) and takes steps uniformly 
distributed in [— 1,+1] then the expected maximum excursion is 



The average ‘distance’ covered by a random walk, from its starting point to when 
it hits a distinguished point, is therefore m/VO. To have good random walks it is 
essential that this value is sufficiently large so that each walk covers a reasonable 
proportion of the tame or wild set. If not, then the walks stay very close to their 
starting point and the probability of two walks colliding is small. On the other 
hand, when m/y/Q is large then there is a good chance that the pseudorandom 
walks will sometimes travel outside T or W. Steps outside the regions of interest 
cannot be included in our probabilistic analysis and so such steps are “wasted”. 
To reduce these wasted steps it is necessary to start walks inside a subset of T 
and W. More details about how to do this are given in [8]. 

We therefore state the following heuristic result. The factor 1 + e takes into 
account the failure of a pseudorandom walk to behave exactly like a random 
walk, in particular due to effects at the boundaries of the regions. 

Heuristic 1. The average expected running time for the Gaudry-Schost algo¬ 
rithm to solve the DLP in an interval of size N is 2.08(1 + e)y/N + 1/9 group 
operations for some small e > 0. 

We admit that the statement of Heuristic |T] (and Heuristic E] later) is essentially 
vacuous (for example, is e = 1 “small”?). We would like to be able to replace 
e by o(l). This may be reasonable for Heuristic Q] but it seems unlikely to be 
reasonable for Heuristic El Certainly we feel it is reasonable to suggest that e 
can be less than 0.1 in both Heuristics Q] and [2] 

The standard Gaudry-Schost algorithm is therefore not as fast as the van 
Oorschot and Wiener version of the Pollard kangaroo method. Nevertheless, 
we will improve upon their approach in groups with fast inversion, to obtain a 
method faster than any known method based on kangaroos. 

3 Equivalence Classes 

Following the work of Gallant, Lambert and Vanstone Em and Wiener and 
Zuccherato |ZTJj it is natural to consider a pseudorandom walk on a set of 
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equivalence classes. For the DLP in an interval this only seems to give an im¬ 
provement when the equivalence class is a set of group elements all of whose 
discrete logarithms lie in the interval. Groups with fast inversion are good can¬ 
didates for this. 

It is necessary to be able to compute a unique representative of the equiv¬ 
alence class so that one can define a deterministic pseudorandom walk on the 
equivalence classes. For example consider the group of points on an elliptic curve 
E : y 2 = x 3 + Ax + B over a finite field F g where q is an odd prime. If we let 
P = (xp , yp) £ E(¥ q ) then the inverse of P is simply — P = (xp, —yp ). Now we 
need a rule to define a unique representative for each equivalence class {P, — P}. 
A simple rule in this case is: treat the y-coordinate of P as an integer 0 < yp < q 
and let the unique representative be (a:p,min{yp, q — yp}). The pseudorandom 
walk is then defined using the unique equivalence class representative. 

If we denote elements of the group by their discrete logarithms and order 
those in the interval [—IV/2, N/2], then the two elements in an equivalence class 
are equidistant from the centre of the interval. A step to the right for one repre¬ 
sentative of the equivalence class corresponds to a step to the left for the other. 
Hence, when using equivalence classes there is no way to avoid having side-to- 
side walks. This is essentially the reason why the standard kangaroo method 
cannot be used with equivalence classes. 

An important issue is that there is a danger of small cycles in the walks. 
This phenomena was noted by Gallant, Lambert and Vanstone m and Wiener 
and Zuccherato [S5|. This can cause the pseudorandom walks to never reach 
a distinguished point. A method to get around this problem is “collapsing the 
cycle” which can be found in Gallant, Lambert and Vanstone m Section 6]. A 
detailed analysis of these issues is given by Bos, Kleinjung and Lenstra [2]. 

It is natural to try to apply the Gaudry-Schost algorithm on equivalence 
classes to solve the DLP in an interval of size N. 

3.1 The Gaudry-Schost Algorithm on Equivalence Classes 

We only give a short sketch of the method, since our main result is a further 
improvement on the basic idea. Recall that we wish to solve h = g n where 
— N/2 < n < N/2. We assume that computing h~ 1 for any h in the group is 
much faster than a general group operation. 

The natural approach is to perform random walks in sets of equivalence classes 
corresponding to the tame and wild sets of the standard Gaudry-Schost method. 
In other words, it is natural to make the following definition. 

Definition 1. Define the tame and wild sets by 

T = {{a, -a} : a £ [-N/2, N/2}} , 

W = {{n + a, -{n + a)} : a £ [-N/ 2, N/2]} . 

Note that \T\ = 1 + N/2 « N/2. 

As before, our main focus is on T D W. When n = 0 we have T = W and 
when n is large then T D W is only about half the size of T. However, a subtlety 
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which did not arise in the previous case appears: when n > N/ 4 and a > N/A 
there is only one way an equivalence class {n + a, — (n + a)} can arise, but when 
\n\ is small there can be two ways. Specifically, suppose —N/A < n < 0, then the 
equivalence class {n + a, — (n + a)} can arise from a and from a' = —2n — a (for 
example, if n = —N/8 then a = N/A and a' = 0 are such that {n+a, — (n+a)} = 
{n + a', — [n + a 7 )})- This phenomena means that the Gaudry-Schost algorithm 
samples from the wild set in a non-uniform way and this means we cannot apply 
Theorem |T] to determine the expected running time of the algorithm. We explain 
these issues more precisely in the next section. 

We do not give an analysis of the average case expected number of group 
operations for this algorithm. In the next section we make a further optimisation 
which leads to a better algorithm. A full analysis of the improved algorithm is 
then given. 

4 The New Algorithm 

We now give an algorithm for the discrete logarithm problem in an interval 
for groups with efficient inversion. As usual, let N, g and h be given such that 
4 | N, h = g n and —N/2 < n < N/2. The basic idea is to run the Gaudry- 
Schost algorithm on the set of equivalence classes. A further speedup is given by 
defining the wild set W to be, in some sense, smaller than the tame set. 

Definition 2. We define the tame and wild sets (as sets of equivalence classes) 
by 

T = {{a, -a} : a £ [-N/ 2, N/2]} , 

W = {{n + a, —(n + a)} : a £ [■ —N/A ,iV/4]} 

where, as always , [Ad,-^ 2 ] = {a £ Z : N± < a < N 2 }. 

The algorithm is then immediate. One samples from T and W using pseudo¬ 
random walks which are well-defined on equivalence classes. When a walk hits 
a distinguished point then we store the representative of the equivalence class, 
its discrete logarithm (or the discrete logarithm of the group element divided by 
h), and the ‘type’ of the walk. When the same equivalence class is reached by 
walks of both types then the discrete logarithm problem is solved. 

To understand the algorithm it is necessary to consider a ‘fundamental do¬ 
main’ for the sets. In other words, we consider sets which are in one-to-one 
correspondence with the set of equivalence classes. A fundamental domain for 
T is T = [0, AT/2]; it is clear that every pair {a,—a} £ T corresponds to 
exactly one value a £ [0, iV/2]. One choice of fundamental domain for W is 
{|n| + a | a £ [— N/A, IV/4]}. However, to visualise T D W we really want the 
fundamental domain for W to consist only of positive values, and this is not the 
case when |n| < iV/4. Hence, when |n| < N/A, we note that the set W in in 
one-to-one correspondence with the multi-set 

W = {|n| + a : a £ [— |n|, N/A]} U {—(|n| + a) : a £ [—N/A, — |n|)} 

= [0,|n| + AT/4] + [0,AT/4- |n|). 


(1) 
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0 N/4 N/2 

Fig. 2. The set T is pictured at the top of the diagram as a long black box. The sets 
W are given for the values n = 0, N/8, N /4 and N/2 where the diagonal lines denote 
single density and the cross hatching denotes double density (i.e., repetitions in the 
multi-set). 



When | n | < N/ 4, sampling uniformly from W corresponds to sampling uniformly 
from the multi-set W, which in turn corresponds to sampling a £ [0, |n| + N/ 4] 
with probability 4/iV for 0 < a < N/4 — |n| and probability 2/N for iV/4 — |n| < 
a < |n| + N/4. We describe this as saying that there is a ‘double density’ of 
walks in the wild set. 

To determine the complexity of the algorithm we need a generalisation of 
Theorem [T| This is a variant of the birthday paradox which applies to coloured 
balls and non-uniform probabilities. Such a result is proved in Pi- 

Theorem 3. Let R £ N and 0 < A < R/2. Suppose we have an unlimited 
number of balls of two colours, red and blue, and R urns. Suppose we alternately 
choose balls of each colour and put them in random urns. Red balls are assigned 
uniformly and independently to the urns. Blue balls are assigned to the urns 
independently with the following probabilities: urns 1 < u < A are used with 
probability 2/R, urns A < u < R—A are used with probability 1 /R, and urns R— 
A < u < R are used with probability 0. Then the expected number of assignments 
that need to be made in total before we have an urn containing two balls of the 
same colour is \JnR + O^R 1 ^). 

We refer to m for the proof. However, it is relatively easy to see why the result 
should be true: The probability that a red ball and a blue ball fall in the same 

_ i_ 

R 


urn is 


Arr + (R ^-^rrAA^ 0 
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which is exactly the same as the probability in the case where both red and 
blue balls are distributed uniformly. One significant difference from the standard 
Tame-Wild birthday paradox is that there is an increased chance of two or more 
blue balls being placed in the same urn (and this has the effect of lowering the 
probability of a collision among balls of different colour). Hence, Theorem [3] does 
not seem to be an immediate consequence of the results in EHESj- 

Theorem 4. If elements are sampled uniformly at random with replacement 
alternately from the sets T and W of Definition [H and recorded, the expecta¬ 
tion, over all problem instances, of the number of selections before a tame-wild 
collision is 

(5^2/4 - 1 )VnN « 1.36 Vn. 

Proof. Let h = g xN for —1/2 < x < 1/2. Due to symmetry we only need to 
look at the positive half of the interval of exponents. As we have seen, when 
0 < s < 1/4 we have W CT and we are sampling in T fl W uniformly with the 
tame elements and non-uniformly with the wild elements. On the other hand, 
when 1/4 < a; <1/2 then T and W are both sampled uniformly, but T D W is 
now a proper subset of T and W in general. The analysis therefore breaks into 
two cases. 

In the case 0 < x < 1/4, by Theorem [3] (taking R to be the size of the funde- 
mental domain for T , which is N/ 2), the expected number of group operations 
to get a collision is \JnN/2. 

In the case 1/4 < x < 1/2 one sees that |TTl W\ = 31V/4 — xN = AT(3/4 — x) 
(here by | T (~l W\ we mean the number of equivalence classes in the intersection) 
and we are in a very similar situation to the proof of Theorem [21 We need to 
sample \Jn\T fl W\ points in T fl W (half of them tame and half wild). Since 
|T| = \W\ = N/2 we expect to sample 

\rKw\ x) - *) = i v*N/m - x) 

group elements in total. 

We now average over all problem instances to get an average case running 
time. 

_ rl/2 _ 

±y/nN/2 + 2 j 3/4 - x) = V^N ^|V2 - lj . □ 

This result suggests the following heuristic statement about the running time of 
the algorithm using pseudorandom walks. The value e takes into account various 
undesirable properties of the pseudorandom walk, such as irregular probability 
distributions at the boundaries of the regions and detecting and escaping from 
fruitless cycles. 

Heuristic 2. Our algorithm to solve the DLP in an interval of size N in a 
group with fast inversion has everage case expected running time of approximately 
1.36(1 + e)\fN + 1/0 group operations for some small e > 0. 
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As mentioned earlier, we believe that e can be taken to be less than 0.1 and our 
experimental results suggest this is reasonable. 

This is a significant improvement on the standard Gaudry-Schost algorithm 
(Heuristic [2]) and the Improved Pollard kangaroo method with heuristic running 
time 1.71 y/N group operations [5]. 

5 Experimental Results 

We implemented the Improved Gaudry-Schost algorithm using equivalence classes 
for solving the DLP in an interval using the software package Magma. The group 
used was the group of points on the following elliptic curve 

E : y 2 = x 3 + 40a; + 1 over F p 

where p = 3645540875029913. The group of points has cardinality 
#E( F p ) = 3645540854261153 > 2 51 . 

We picked various interval sizes and ran a number of experiments on those 
intervals. Each experiment involved choosing uniformly at random —N/2 < n < 
N/2 and solving the DLP for Q = [n\P. We counted the number of group 
operations performed and averaged this over the total number of trials. Walks 
were not permitted to start within a distance my/2j$n6 from the edge of any of 
the sets (this is roughly half the size of the expected maximum distance travelled 
by a walk). 

The average number of group operations performed for the different experi¬ 
ments are given in Table [TJ 

To detect small cycles we stored the previous 30 group elements in the walk 
in the case N ss 2 34 (respectively, 30, 45 group elements for N ss 2 40 , 2 48 ). Each 
new step in the walk was compared with the previous 30 (respectively, 35, 45) 
group elements visited. If this group element had already been visited then the 
walk is in a cycle. We used a deterministic method to jump out of the cycle (using 
a jump of distinct length from the other jumps used in the pseudorandom walk) 
so that the pseudorandom walk as a whole remained deterministic. The cost of 
searching the list of previous group elements is not included in our experimental 
results, but our count of group operations does include the “wasted” steps from 
being in a cycle. 

We terminated walks which ran for 5/9 steps without arriving at a distin¬ 
guished point (the usual recommendation is 20/0 steps; see [22]). This will give 
us a slightly worse running time than optimal. 

There is plenty of room for improvement in these experimental results. First, 
techniques like those in [2] to handle cycles should lead to improved running 
times (though note that we cannot use doublings/squarings when working in an 
interval). Second, the relationship between the values of m and 6 is probably not 
optimal. Third, one might get better results by not running the same number of 
tame walks as wild walks or by slightly changing the sizes of the tame and wild 
regions. 
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Table 1. Average number of group operations performed by our algorithm for different 
values of N 



# of Experiments 

Improved GS on equivalence classes 

Experiment 1 
N « 2 34 
m = 2 8 

9 = 2 -5 

1000 

1.49 VN 

Experiment 2 
N « 2 40 
m = 2 11 

9 = 2 -5 

300 

1.47 y/N 

Experiment 3 
N « 2 48 
m = 2 14 ' 5 

9 = 2“ 6 

50 

1.46 VN 


6 Conclusion 

We have presented the first algorithm to exploit equivalence classes for the dis¬ 
crete logarithm problem in an interval. Our algorithm can be applied in groups 
where we have fast inversion such as in the group of points on an elliptic curve. 
The average expected running time of our algorithm is close to 1 .36\ /r N group 
operations. Our practical experiments confirm that we can achieve a significant 
improvement over previous methods. 
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A Background on the Pollard Kangaroo Method 

We first briefly recall the Pollard kangaroo method using distinguished points 
as described by van Oorschot and Wiener (22J and Pollard [25] ■ To fix notation: 
We are given g, h and N and asked to find 0 < n < N such that h = g n . 

As with the rho method, the kangaroo method relies on a pseudorandom walk, 
however steps in the kangaroo walk correspond to known small increments in 
the exponent (in other words, kangaroos make small jumps). The tame kangaroo 
starts in the middle of the interval (i.e., at g N / 2 ) and jumps towards the right. 
The wild kangaroo starts at the group element h and jumps to the right using 
the same pseudorandom walk. On a serial computer one alternately jumps the 
tame and wild kangaroos. Every now and then a tame or wild kangaroo lands 
on a distinguished group element u and stores it in a sorted list, binary tree 
or hash table together with its discrete logarithm (if the kangaroo is tame) or 
the discrete logarithm of w/i -1 (if the kangaroo is wild). Once the same group 
element is visited twice by different kangaroos the DLP is solved. 

The kangaroo method is not analysed using the birthday paradox but using 
the mean step size m of the pseudorandom walks. Once the rear kangaroo reaches 
the starting point of the front kangaroo it is jumping over a region where roughly 
one in to group elements have been visited by the front kangaroo. Hence, there 
is a roughly 1/m probability at each step that the back kangaroo lands on a 
footprint of the front kangaroo. Therefore, the walks collide after an expected 
to steps. 

One obtains the heuristic average case expected running time of approximately 
2 y/N group operations as follows: Choose to = y/N /2. The rear kangaroo is, on 
average, distance N /4 from the front kangaroo. The rear kangaroo therefore per¬ 
forms N/ (4to) jumps to reach the starting point of the front kangaroo, followed 
by to more steps until the walks collide (and then a small number more steps 
until a distinguished point is hit). Since there are two kangaroos in action the 
total running time is roughly 2(iV/(4m) + to) = 2yfN group operations. 
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B Two-Dimensional Problems 

One can consider the multi-dimensional DLP: Given g \,..., g^, h and bounds 
Ni,..., Nd, to compute m, ... , n<j £ Z such that h = g 7 / 1 ■ ■ ■ g/f d and |n,| < N, 
for 1 < i < d. We call the integer d the dimension. The size of the solution 
region is N = n,=i (2 TV, + 1). This problem arises in a number of applications. 
For example, Gaudry and Schost [14j use algorithms for the 2-dimensional DLP 
in point counting on hyperelliptic curves of genus 2. 

The 2-dimensional DLP also arises if one tries to analyse the security of elliptic 
curve cryptography using the Gallant-Lambert-Vanstone (GLV) method [T2]. In 
this method one has an efficiently computable group homomorphism if and one 
computes nP for P £ E(¥ g ) and n £ N as n\P + n 2 ilfP) where |ni|, \n 2 \ ~ y/n. 
There is an algorithm to compute the pair (ni,n 2 ) from n, but a natural trick 
is to choose ( 711 , 712 ) directly. It is tempting to choose |tii| and |ti 2 | to be a 
little smaller than yjn, and the extent to which this can be done without losing 
security depends on the difficulty of the 2-dimensional DLP. Gaudry and Schost 
do not give a precise figure for the running time of this algorithm but we have 
the following heuristic under the usual assumptions (see [8] for further details of 
this result and an improvement of the constant from 2.43 to 2.36). 

Heuristic 3. The Gaudry and Schost m algorithm solves the 2-dimensional 
DLP in as above in 2.43(1 + e)y/N + 1/9 group operations for small e > 0. 


B.l Solving Using Equivalence Classes 

In groups with efficiently computable inverse (such as the groups of interest to 
Gaudry and Schost and the GLV method) one can consider equivalence classes 
as we did in the 1-dimensional case. To be precise, let 


T = {{(z, y), ( -x , -y)} : x,y £ Z , -IVi <x<Ni, -N 2 <y< ^ 2 } 

be the set of equivalence classes of points in a box of area N = (2Vi + 1)(27V 2 + 1) 
centered at 0. For ( 771 , 712 ) such that —TV,; < ti, < TV, (i £ {1, 2}) we consider the 
set 

{ Mi, u 2 £ Z, 

{(tji + iti, n 2 + M2), (—(mi + Mi), —(n 2 + m 2 ))} : —TVi/2 < mi < TVi/2, 

-TV 2/2 < M 2 < N 2 /2 

To analyse the algorithm again requires visualising the sets via a ‘fundamental 
domain’. Since the map ( x , y) 1 —> (—x , —y) is rotation by 180 degrees, a nat¬ 
ural fundamental domain is the halfplane y > —x. One therefore defines the 
fundamental domain T for T to be 


T = {(x,y) : —TVi < x < TV l5 —x < y < TV 2 }. 

Note that \T\ ss 2N\N 2 . A fundamental domain for W which is contained in T is 
easily defined, but note that, as in Section 0] this can be a multi-set and we can 
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again be in the case of non-uniform sampling of W. Indeed, when 0 < ni < N\/2 
and 0 < 7i2 < N 2/2 this is the case and the region which has ‘double density’ 
has area A = N\ — 2n\){N2 — 2ri2)- When n\ > N\/2 or ri 2 > N 2/2 then the 
distribution on W is uniform but W is not usually contained in T any more. In 
these cases \T fl W\ varies between NiN% = \\T\ and \N\N 2 = ||T|. 

We considered an algorithm which chooses elements from T and W uniformly 
at random, selecting elements with a ratio of 2 : 1 (i.e., twice as many tame 
walks as wild walks, since the tame set is at least twice as big as the wild set). 

Our rough calculations suggest that the algorithm (when using a truly random 
walk) should require fewer than 2.01 y/N group operations. This is a significant 
speedup over the algorithm of Gaudry and Schost for the cases of practical inter¬ 
est. It remains an open problem to find optimal parameters for this algorithm, 
to analyse its complexity precisely, and to give experimental results which show 
how closely one can get to the idealised theoretical analysis. 

B.2 Larger Equivalence Classes in the GLV Method 

We now assume that Ni = N 2 and that the 2-dimensional DLP of interest is 
Q = niP + U 2 ’i/j(P) with |m|, |ri 2 1 < Ni. Again write N = (2N\ + 1)(2N2 + 1). 
Since one knows the logarithm of i/j(P) to the base P it is sufficient to compute 
n\ and 712 . 

Frequently with the GLV method [T2J the homomorphism if) satisfies f/i 2 = — 1. 
This happens, for example, with the standard curve y 2 = a: 3 + Ax over F p with 
ip(x,y) = (— x,iy ). It also holds for the homomorphisms used by Galbraith, Lin 
and Scott [7] . In this setting one can consider the equivalence classes 

{Q,-Q^(Q),-HQ)} 

of size 4. If Q = n\P + n 2 ip(P) then these 4 points correspond to the pairs of 
exponents 

{(ni, n 2 ), (-ni, -n 2 ), (-712, tu), (ti 2 , -tu)} 

and so action by ifj corresponds to rotation by 90 degrees. 

It is natural to apply the Gaudry-Schost algorithm on these equivalence 
classes. We take the sets T and W analogous to those in Section IB.II Find¬ 
ing a suitable fundamental domain for the symmetry under rotation is not hard 
(for example take {(x, y) : 0 < x, 0 < y}). One now finds that some regions of 
the wild set can have quadruple density (as well as single and double density). 

Again, it remains an open problem to determine the optimal algorithm for 
this problem and to estimate its complexity. A very rough calculation suggests 
that there is an algorithm for this problem (using a truly random walk) which 
requires fewer than l.ll-\/fV" group operations. 
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Abstract. In functional encryption (FE) schemes, ciphertexts and pri¬ 
vate keys are associated with attributes and decryption is possible when¬ 
ever key and ciphertext attributes are suitably related. It is known that 
expressive realizations can be obtained from a simple FE flavor called 
inner product encryption (IPE), where decryption is allowed whenever 
ciphertext and key attributes form orthogonal vectors. In this paper, we 
construct (non-anonymous) IPE systems with constant-size ciphertexts 
for the zero and non-zero evaluations of inner products. These schemes 
respectively imply an adaptively secure identity-based broadcast encryp¬ 
tion scheme and an identity-based revocation mechanism that both fea¬ 
ture short ciphertexts and rely on simple assumptions in prime order 
groups. We also introduce the notion of negated spatial encryption, which 
subsumes non-zero-mode IPE and can be seen as the revocation analogue 
of the spatial encryption primitive of Boneh and Hamburg. 

Keywords: Functional encryption, identity-based broadcast encryption, 
revocation, efficiency. 

1 Introduction 

Ordinary encryption schemes usually provide coarse-grained access control since, 
given a ciphertext, only the holder of the private key can obtain the plaintext. In 
many applications such as distributed file systems, the need for fine-grained and 
more complex access control policies frequently arises. To address these concerns, 
several kinds of functional public key encryption schemes have been studied. 

Functional encryption can be seen as a generalization of identity-based en¬ 
cryption (IBE) [24|8j . In IBE schemes, the receiver’s ability to decrypt is merely 
contingent on his knowledge of a private key associated with an identity that 
matches a string chosen by the sender. In contrast, functional encryption (FE) 
systems make it possible to decrypt using a private key sk x corresponding to a 
set x of atomic elements, called attributes , that is suitably related - according to 
some well-defined relation R - to another attribute set y specified by the sender. 

* This author acknowledges the Belgian National Fund for Scientific Research (F.R.S.- 
F.N.R.S.) for their support and the BCRYPT Interuniversity Attraction Pole. 
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The goal of this paper is to describe new (pairing-based) functional encryp¬ 
tion constructions providing short ciphertexts (ideally, their length should not 
depend on the size of attribute sets) while providing security against adaptive 
adversaries or supporting negation (e.g. decryption should be disallowed to hold¬ 
ers of private keys sk x for which R(x , y) = 1). 

Related Work. The first flavor of functional encryption traces back to the 
work of Sahai and Waters [22] that was subsequently extended in wm- Their 
concept, called attribute-based, encryption (ABE), allows a sender to encrypt data 
under a set of attributes u while an authority generates private keys for access 
control policies T. Decryption rights are granted to anyone holding a private key 
for a policy T such that T(u>) = 1. Identity-based broadcast encryption (IBBE) 
IjL:;Ii:;N and revocation (IBR) [TjJ| schemes can also be thought of as func¬ 
tional encryption systems where ciphertexts are encrypted for a set of identities 
S= {IDi,...,ID„} : in IBBE (resp. IBR) systems, decryption requires to hold a 
private key skip for which ID £ S (resp. ID ^ S). 

The above kinds of functional encryption systems are only payload hiding in 
that they keep encrypted messages back from unauthorized parties but cipher- 
texts do not hide their underlying attribute set. Predicate encryption schemes 
|lflll£l2bl2f)| additionally provide anonymity as ciphertexts also conceal the at¬ 
tribute set they are associated with, which enables m efficient searches over 
encrypted data. In [15], Katz, Sahai and Waters devised a predicate encryption 
scheme for inner products: a ciphertext encrypted for the attribute vector Y can 
be opened by any key sk^ such that X ■ Y = 0. As shown in [IS] , inner product 
encryption (IPE) suffices to give functional encryption for a number of relations 
corresponding to the evaluation of polynomials or CNF/DNF formulae. 

Our Contributions. While quite useful, the IPE scheme of m strives to 
anonymize ciphertexts, which makes it difficult to break through the linear com¬ 
plexity barrier (in the vector length n) in terms of ciphertext size. It indeed 
seems very hard to avoid such a dependency as long as anonymity is required: 
for instance, anonymous FE constructions mm suffer from the same overhead. 
A similar problem appears in the context of broadcast encryption, where the only 
known scheme [3] that conceals the receiver set also has 0(n)-size ciphertexts. 

This paper focuses on applications of IPE schemes, such as identity-based 
broadcast encryption and revocation systems, where the anonymity property is 
not fundamental. Assuming public ciphertext attributes rather than anonymity 
may be useful in other contexts. For instance, suppose that a number of cipher- 
texts are stored with varying attributes y on a server and we want to decrypt 
only those for which R{x,y) = 1. Anonymous ciphertexts require to decrypt all 
of them whereas public attributes y make it possible to test whether R(x,y) 
(which is usually faster than decrypting) and only decrypt appropriate ones. 

At the expense of sacrificing anonymity, we thus describe IPE schemes where 
the ciphertext overhead reduces to 0(1) as long as the description of the cipher- 
text attribute vector is not considered as being part of the ciphertext, which 
is a common assumption in the broadcast encryption/revocation applications. 
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In addition, the number of pairing evaluations to decrypt is also constant, which 
significantly improves upon 0(n), since pairings calculations still remain costly. 

Our first IPE system achieves adaptive security, as opposed to the selective 
model, used in ||18j , where the adversary has to choose the target ciphertext vec¬ 
tor Y upfront. To acquire adaptive security, we basically utilize the method used 
in the Waters’ fully secure IBE [27], albeit we also have to introduce a new trick 
called “n-equation technique” so as to deal with the richer structure of IPE. Our 
system directly yields the first adaptively secure identity-based broadcast encryp¬ 
tion scheme with constant-size ciphertexts in the standard model. Previous IBBE 
with 0(l)-size ciphertexts were either only selective-ID secure [2111319125] or in the 
random oracle model fT5]. Among IBBE systems featuring compact ciphertexts 
(including selective-ID secure ones), ours is also the first one relying on simple as¬ 
sumptions ( i.e ., no q- type assumption) in prime order groups. 

It is worth mentioning that techniques developed by Lewko and Waters [2D) 
can be applied to the construction of Boneh and Hamburg [Dj to give fully se¬ 
cure IBBE with short ciphertexts in composite order groups. However, it was 
not previously known how to obtain such a scheme in prime order groups (at 
least without relying on the absence of computable isomorphism in asymmet¬ 
ric pairing configurations). Indeed, despite recent progress [2], there is still no 
black-box way to translate pairing-based cryptosystems from composite to prime 
order groups. In particular, Freeman’s framework [14] does not apply to |20j . 

Our second contribution is an IPE system for non-zero inner products: cipher- 
texts encrypted for vector Y can only be decrypted using sk^- if X ■ Y ^ 0, which 
- without retaining anonymity - solves a question left open by Katz, Sahai and 
Waters [T5] [Section 5.4]. The scheme implies the first identity-based revocation 
(IBR) mechanism [ID] with 0(l)-size ciphertexts. Like the schemes of Lewko, 
Sahai and Waters [19| . its security is analyzed in a non-adaptive model where 
the adversary has to choose which users to corrupt at the outset of the gam£0. In 
comparison with jig where ciphertexts grow linearly with the number of revoked 
users and public/private keys have constant size, our basic IBR construction per¬ 
forms in the dual way since key sizes depend on the maximal number of revoked 
users. Depending on the application, one may prefer one scheme over the other 
one. We actually show how to generalize both implementations and obtain a 
tradeoff between ciphertext and key sizes (and without assuming a maximal 
number of revoked users): the second scheme of m and ours can be seen as 
lying at opposite extremities of the spectrum. 

On a theoretical side, our non-zero IPE realization turns out to be a particular 
case of a more general primitive, that we call negated spatial encryption , which 
we define as a negated mode for the spatial encryption primitive of Boneh and 
Hamburg [9]. Namely, keys correspond to subspaces and can decrypt ciphertexts 
encrypted under points that lie outside the subspace. This generalized primitive 
turns out to be non-trivial to implement and we had to use a fully generalized 


1 We indeed work in a slightly stronger model, called co-selective-ID, where the adver¬ 
sary chooses which parties to corrupt at the beginning - before seeing the public key 
but is not required to announce the target revoked set until the challenge phase. 
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form of our new “ro-equation” technique. The proposed scheme is proven secure 
under a non-standard assumption defined in (T[[|. 

Our Techniques. The core technique of our non-zero IPE scheme will be used 
throughout the paper, including in our adaptively secure zero IPE scheme. This 
can be viewed analogously to fact that Waters’ fully secure IBE [27] uses the 
revocation technique of [05]. Our non-zero IPE also builds on [05] • However, the 
fact that non-zero IPE has much richer structure than revocation scheme and 
the pursued goal of achieving constant ciphertext size together prevent us from 
using their techniques directly. To describe the difficulties that arise, we first 
outline the Lewko-Sahai-Waters revocation scheme in its simplified form where 
security proof is not provided and where only one user is revoked. 

Construction 1. (A simplified revocation scheme) 

► Setup: lets (G, G t) be bilinear groups of prime order p and picks g G, 

0 , 01,02 The public key is (g, g ai ,g a2 ,e(g, g) a ). The master key is g a . 

► Key Gen: chooses t 4- and outputs a private key for an identity ID £ 7L V as 
(A' 0 = g\ K\ = g a+a l( , K 2 = g t{a i' d +« 2 )) 

► Encrypt: encrypts M and specifies a revoked ID' by choosing s 4- Z p and 
computing (Eq = M • e(g,g ) as , Ei = g s (“i |D + Q 2 ) j j? 2 = g s ). 

1 _ 1 

► Decrypt: decryption computes e(K 2 , E 2 ) ID ~ ID ' e(£i, I\q) ID - |D ' = e(g,g) aits if 
ID / ID'. It then computes e(g,g) as as e(K\, E 2 )/e(g, g) aits = e(g,g) as . 


The scheme can be explained by viewing a key and a ciphertext as forming a 
linear system of 2 equations in the exponent of e(g, g) with variables a\ts, o 2 ts. 


Md,id' 


( oits\ 

^a 2 fs J 


( ID l\ fonts'} flog(e{K 2 ,E 2 ))\ 
\ID' 1J \ a 2 tsj \log(e(Ei, K 0 ))J * 


Computing e(g 1 g) aits amounts to solve the system, which is possible when 
det(M/£)j D ') 7 ^ 0 (and thus ID ^ ID', as required). In particular, decryption 
computes a linear combination (in the exponent) with coefficients from the first 
row of M^\ d , which is (jd^. Td^TdO- In Pl> this is called “ 2 -equation tech¬ 
nique”. The scheme is extended to n-dimension, i.e.. the revocation of n users 
{ID' 1; ..., ID„}, by utilizing n local independent systems of two equations 

Md,id' (otitsj, a 2 ts^ = (log (e(K 2 ,E 2 j)), log (e{Ei tj , K 0 ))j for j e [l,n], 

that yield 2 n ciphertext components {E\ j, E 2 j), each one of which corresponds 
to a share Sj of s such that s = Sj. The decryption at j-th system returns 
e(g,g) aitSi if ID =4 ID). Combining these results finally gives e(g,g) aits . 

We aim at constant-size ciphertexts for non-zero IPE schemes of dimension n. 
When trying to use the 2-equation technique with n dimensions, the following 
difficulties arise. First, the “decryptability” condition X ■ Y ^ 0 cannot be de¬ 
composed as simply as that of the revocation scheme, which is decomposable as 
the conjunction of ID ^ ID) for j £ [l,n]. Second, the ciphertext size was 0(n). 
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Towards solving these, we introduce a technique called “n-equation tech¬ 
nique”. First, we utilize n secret exponents a = (aq,... ,a n ) T and let aq func¬ 
tion as the “master” exponent while ai,,a n serve as the “perturbed” factors. 
Intuitively, we will set up a system of n linear equations of the form: 

Mjt^(a 1 ts,...,a n ts) T = (log(e(it' il , E h )),..., log(e(It' in , E jn ))) T (1) 

where {K lk } and {E Jk } are elements of <G defined for a key for X and a ciphertext 
for Y respectively. At first, this generalized system seems to require linear-size 
ciphertexts (E n ,..., Ej n ). A trick to resolve this is to reuse ciphertext elements 
throughout the system: we let Ej k = Ei = g s for k £ [l,n — 1]. This effectively 
yields a constraint Mg g = ( Q P R T ) T , where Qg is a (n — 1) x n matrix 
parameterized only by X and R is a 1 x n matrix. The remaining problem is 
then to choose Mg g in such a way that the system has a solution if X ■ Y ^ 0 
(the decryptability condition). To this end, we define 


1 

yj 

where it holds that det {Mg g) = (— l) n+1 X ■ Y/x\. By translating this concep¬ 
tual view back into algorithms, we obtain a basic non-zero IPE scheme. From 
this, we propose two schemes for non-zero IPE: the first one is a special case of 
negated spatial encryption scheme in section 15.11 while the second one is proven 
secure under simple assumptions and given in section 15.21 
Organization. In the forthcoming sections, the syntax and the applications of 
functional encryption are explained in sections 0 and [3] We describe our zero 
mode IPE system in section [U Our negated schemes are detailed in section |5j 

2 Definitions 

2.1 Syntax and Security Definition for Functional Encryption 

Let R : Sy x —* {0,1} be a boolean function where Ey and E e denote 
“key attribute” and “ciphertext attribute” spaces. A functional encryption (FE) 
scheme for R consists of the following algorithms. 

o Setup(l A , des) —> (pk, msk): takes as input a security parameter 1 A and a 
scheme description des (which usually describes the dimension n), and outputs 
a master public key pk and a master secret key msk. 
o KeyGen(ai, msk) —> sk^: takes as input a key attribute x £ and the master 
key msk. It outputs a private decryption key sk x . 




' X2 Y 

Xl 

3 . 1 


Vi 2/2 2/3 • ■ 
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o Encrypt(y, M, pk) —> C: takes as input a ciphertext attribute y £ £ e , a message 
M £ A4, and public key pk. It outputs a ciphertext C. 
o Decrypt(C', y, sk^, x) —> M: given a ciphertext C with its attribute y and the 
decryption key sk x with its attribute x , it outputs a message M or X. 

We require the standard correctness of decryption, that is, for all A, all 
(pk, msk) <— Setup(l A ), all x £ all sk x <— KeyGen(cc, msk), and all y £ U e , 

o If R(x , y) = 1, then Decrypt(Encrypt(y, M, pk), sk^) = M. 
o If R(x , y) = 0, Decrypt(Encrypt(y, M, pk), sk x ) = X with probability nearly 1. 

Terminology and Variants. We refer to any encryption primitive A that can 
be casted as a functional encryption by specifying its corresponding function 
R a : S A x S A —> {0,1}. For a FE primitive A, we can define two variants: 

o Dual Variant, denoted by Dual(A), is defined by setting y;Duai(A) ;= y;A an( } 
jj-,Duai(A) jjA an( j R^^x^y) = /j Dua| ( A ) (y, x ). In a dual variant, the roles of 
key and ciphertext attributes are swapped from those of its original primitive, 
o Negated Variant, denoted by Neg(A), is defined by using the same domains 
as A and setting R Neg< - A \x, y) = 1 R A (x, y) = 0. The condition is thus the 
opposite of the original primitive. 

Security Definition. A FE scheme for a function T2 : x A7 e —> {0,1} is fully 

secure if no PPT adversary A has non-negligible advantage in this game. 

Setup. The challenger runs Setup(n) and hands the public key pk to A. 

Query Phase 1. The challenger answers private key queries for x £ AA by 
returning sk*, <— KeyGen(a:, msk). 

Challenge. A submits messages Mo, Mi and a target ciphertext attribute vector 
y* £ S e such that R(x,y*) = 0 for all key attributes x that have been queried 
so far. The challenger then flips a bit /3 4- {0,1} and computes the challenge 
ciphertext C* <— Encrypt(y, M. 3 , pk) which is given to A. 

Query Phase 2. The adversary is allowed to make further private key queries 
x £ Xk under the same restriction as above, i.e., R(x , y*) = 0. 

Guess. The adversary A outputs a guess (3' £ {0,1} and wins if (3' = (3. In the 
game, Al’s advantage is typically defined as Adv^(A) = |Pr[/3 = (3'] — ||. 

(Co-)Selective Security. We also consider the notion of selective security 
cm where A has to choose the challenge attribute y * before the setup phase, 
but can adaptively choose the key queries for x\,... ,x q . One can consider its 
“dual” notion where A must output the q key queries for attribute vectors 
x\,...,x q before the setup phase, but can adaptively choose the target chal¬ 
lenge attribute y*. We refer to this scenario as the co-selective security model, 
which is useful in some applications such as revocation. By definition, both no¬ 
tions are incomparable in general and we do not know about their relation yet. 

We shall show how one FE primitive can be obtained from another. The 
following useful lemma from [S] describes a sufficient criterion for implication. 
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Proposition 1 (Embedding Lemma ||9]). Consider encryption primitives 
A, B that can he casted as functional encryption for functions 7? A ,7? , respec¬ 
tively. Suppose there exists efficient injective mappings /k : E£ —► A 1 ® and 
f e : E* —► E B such that R B (f^(x), f e (y)) = 1 <t=> 7? A (x, y) = 1. Let 77b &e a con- 
struction for primitive B. We then construct 77a for primitive A from 77b by ap¬ 
plying mappings /k, f e to all key attributes and ciphertext attributes, respectively. 
More precisely, we use exactly the same setup algorithm and define key genera¬ 
tion and encryption procedures as n&.KeyGen(x, msk) := 77B.KeyGen(/k(a;), msk) 
and 77A.Encrypt(y, M, pk) := 77B.Encrypt(/ e (?/), M, pk), respectively. Then, if 77b 
is secure, so is 77a. This holds for adaptive, selective, co-selective security models. 
We denote this primitive implication by B A. 

We immediately obtain the next corollary stating that the implication applies 
to the negated (resp. dual) variant with the same (resp. swapped) mappings. 

Corollary 1. B A implies Dual(B) Dual(A) and Neg(B) Neg(A). 


2.2 Complexity Assumptions in Bilinear Groups 

We consider groups (G, G t) of prime order p with an efficiently computable map 
e:GxG-t G t such that e(g a , h b ) = e(g, h) ab for any {g, h) £ G x G and a, b £ Z 
and e(g, h) ^ 1<g t whenever g,h ^ 1g- In these groups, we assume the hardness 
of the Decision Bilinear Diffie-Hellman and Decision Linear j5] problems. 

Definition 1. The Decision Bilinear Diffie-Hellman Problem (DBDH) in 
(G, Gt) is, given elements (g, g Sl , g S2 , g 93 ,g) £ G 4 x G t with 61 , 62,63 Z p , to 
decide whether g = e(g, g ) ei8263 or g Gr G t- 

Definition 2. The Decision Linear Problem (DLIN) in G consists in, given 
a tuple (g, f, v, g Sl , f S2 , g) £ G 6 with 61,62 Z p and f,g,v G, deciding 
whether = v 8l+e ' 2 or v Gr G. 

3 Functional Encryption Instances and Their Implications 

3.1 Inner Product Encryption and Its Consequences 

We underline the power of IPE by showing its implications in this section. Each 
primitive is defined by describing the corresponding boolean function R. We 
then show how to construct one primitive from another by explicitly describing 
attribute mappings. In this way, correctness and security are consequences of the 
embedding lemma. Basically, the approach follows exactly the same way as m 
A new contribution is that we also consider the negated variant of primitives, 
which will be useful for non-zero polynomial evaluation and revocation schemes. 
The implication for negated variants follows from Corollary [TJ 

Inner Product. An inner product encryption (IPE) scheme over Z^, for some 
prime p, is defined as follows. Both attribute domains are A7^. PE ” = E^ En = Z™. 
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We consider two distinct IPE modes here. The first one is zero-mode IPE where 
i?z |p E>» (X, Y) = 1 iff X ■ Y = 0. The other one is its negated primitive, which we 
call the non-zero-mode IPE, where R NIPE «(X,y) = 1 iff X ■ Y ^ 0. 

Polynomial Evaluation. Functional encryption for the zero evaluation of poly¬ 
nomials of degree < n is defined as follows. The ciphertext and key attribute 
domains are defined as sf Poly -" = and !^ Poly - n ={Pe Z p [x\ | deg(P) < n}, 
respectively. The relation is defined by R ZPoy < n (P, x) = 1 iff P(x ) = 0. The non¬ 
zero evaluation mode can be defined as its negated primitive Neg(ZPoly <n ). 

Given an IPE scheme over Z” +1 , one obtain a functional encryption system 
for polynomial evaluation via the following embedding. For the key attribute, 
we map the polynomial P[X] = po + P\X + • • • + p n X n to X p = (po, • ■., p n )• 
Regarding ciphertext attributes, each element w € Z p is mapped onto a vector 
Y w = (1, w, w 2 ,..., w n ). Correctness and security hold since P(w) = 0 whenever 
X p ■ Y w = 0. The non-zero evaluation case can be analogously derived from the 
non-zero-mode IPE using the same mappings, due to Corollary [T| 

We can also consider other variants such as a scheme that supports multivari¬ 
ate polynomials and a dual variant, where the key attribute corresponds to a 
fixed point and the ciphertext attribute corresponds to a polynomial, as in BBJ- 

OR, AND, DNF, CNF Formulae. We now consider a FE scheme for some 
boolean formulae that evaluate disjunctions, conjunctions, and their extensions 
to disjunctive or conjunctive normal forms. As an example, a functional encryp¬ 
tion scheme for boolean formula R 0R ^ n : x Zv —> {0,1} can be defined 

by i? 0R <' i ((Ji,... ,I k ),z) i—> 1 (for k < n) iff (z = I\) or • • • or (z = Ik )• This 
can be obtained from a functional encryption for the zero evaluation of a uni¬ 
variate polynomial of degree smaller than n by generating a private key for 
foR,i 1 ,...,i k ( z ) = (z — I\) ■ ■ ■ {z — Ik), and letting senders encrypting to z. 

Other fundamental cases can be considered similarly as in [IT5] and are shown 
below. In [TH] only non-negated policies (the first three cases below and their 
extensions) were considered. Schemes supporting negated policies (the latter 
three cases below and their extensions) are introduced in this paper. The negated 
case can be implemented by IPE for non-zero evaluation. One can combine these 
cases to obtain DNF, CNF formulae. Below, r Z p is chosen by KeyGenH 


Policy 

(z = h) or (z = h) 
(zi = h) or (z2 = h) 
(zi = h) and (z 2 = h) 
[z\ ^ Ji) or (z 2 ^ h) 
(z ± h) and (z ^ I 2 ) 
(zi ^ h) and (z 2 ^ h) 


Implementation 

/or,/i,/ 2 ( 2 ) = ( z ~ ~ h ) = 0 

/orpl /2 («i. «z) = (21 - h )( z 2 - I 2 ) = 0 
f/KND,i 1 ,i 2 (zi,z 2 ) = (zi - h)r+ (z 2 - I 2 ) = 0 
/ nor ,/ i ,/ 2 ( 21 , 22 ) = (21 — Ii)r + (z 2 — I 2 ) ^ 0 
/"nandpi ,r 2 ( 2 ) = (2 — h)(z — I 2 ) 0 

/nand./i ./ 2 (£j l~ 2 -^ ~ ( :i — Ii)i z 2 — 1 2 ) 7^ 0 


ID-based Broadcast Encryption and Revocation. Let X be an identity 
space. An ID-based broadcast encryption scheme (IBBE) for maximum n receivers 

2 As noted in |JH|, the AND (and NOR) case will not work in the adaptive security 
model since the information on r leaks. 
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per ciphertext is a functional encryption for /J IBBE <*> : 1 x 2 X —> { 0 , 1 } defined 
by i? IBBE <» : (ID,,!?) i—> 1 iff ID £ S. An IBBE system can be constructed by a 
functional encryption for j£ Dua| (OR<„). x 0 encrypt a message for the receiver set 
S = {IDi,..., IDfc}, one encrypts using the policy (z = IDi) or • ■ • or (z = ID*,). 

Likewise, identity-based revocation (IBR) jT9] for at most n revocations per 
ciphertext can be casted as a negated IBBE, i.e., i? IBR <« : (ID. R) i—> 1 iff ID ^ R. 

3.2 Spatial Encryption 

We now recall the concept of spatial encryption [S], For a n x d matrix M of 
which elements are in Z p and a vector c £ Z™, we define its corresponding affine 
space as Aff(M, c) = {Afro + c\wG Z p }. Let V n C 2be the collection of all 
affine spaces inside Z". That is, V n = {Aff(M, c) \ M £ M„ xc ;,c £ Z™,d < n}, 
where M„ X d is the set of all n x d matrices in Z p . 

A spatial encryption in Z™ is a functional encryption for a relation l? Spatlai : 
V„ x Z™ —> {0, 1} defined by /J s P atlal ; (V, y) i— > 1 iff y £ V. 

The notion of spatial encryption was motivated by Boneh and Hamburg [9]. It 
has many applications as it notably implies broadcast HIBE and multi-authority 
schemes. Nevertheless, its connection to inner-product encryption has not been 
investigated so far. In section 14. II we prove that spatial encryption implies inner 
product encryption by providing a simple attribute mapping. 

As a result of independent interest, we also consider the negated spatial en¬ 
cryption primitive (namely, FE that is defined by /£ Ne s( s P atla| ) : (R y) i—>■ 1 iff 
y V) and provide a construction in section 15.11 From this scheme and Corol¬ 
lary [H together with our implication result of zero-mode IPE from spatial en¬ 
cryption, we then obtain a non-zero-mode IPE construction. 

4 Functional Encryption for Zero Inner-Product 

4.1 Warm-Up: Selectively Secure Zero IPE from Spatial Encryption 

We first show that spatial encryption implies zero IPE. For the key attribute, 
we map X = (x\,... ,x n ) T £ Z™ to an (n — l)-dimension affine space Vg = 
Aff(M^, 0 n ) = {M^w + 0 n | w £ Z” -1 } with the matrix Mg £ Zp X(n ~^ 



For any Y = (yi,... ,y n ) T £ we then have X ■ Y = 0 <t=> Y £ Vg since 
X • Y = 0 yi = y 2 • (-|f) + ••• + »„• (-I 7 ) Y = Mg ■ ( y 2 ,... ,y„) T ^ 
Y £ Vg. By the embedding lemma, we can therefore conclude its implication. 

In [9], Boneh and Hamburg described a selectively secure construction of 
spatial encryption that achieves constant-size ciphertexts (by generalizing the 
Boneh-Boyen-Goh HIBE [!>]). We thus immediately obtain a selectively secure 
zero IPE scheme with constant-size ciphertext as shown below. 
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We give some notations here. For a vector a = (cq,..., a n ) T £ Z p , we write 
g a to denote (g ai , ■ ■ . , <7° n ) T ■ Given g a , z, one can easily compute ( g a ) z := g < ' a ’^ > , 
where (a, z) denotes the inner product a ■ z = a T z. 

Construction 2. (Selectively secure zero IPE) 

► Setup(l A ,n): chooses bilinear groups (G, G-r) of prime order p > 2 A with 
a generator g A- G. It chooses a, ao, ■ ■ ■, ot n Z p . Let a = (a\, ..., a n ). The 
public key is pk = (g,g a °,H = g a ,Z = e(g,g) a ). The master key is msk = g a . 

► KeyGen(A, msk, pk) : chooses i 4- Z p and parses X as (xi ,..., x n ) and returns 
_L if Xj = 0. It outputs the private key as sk-^ = (Do, D\, K 2 ,..., K n ) where 

Do = g\ D x = g a +{K t = (g~ ai ^ g ai Y}i= 

► Encrypt(y, pk): the encryption algorithm first picks s Z p . It parses Y as 
(j/i,..., y n ) and computes the ciphertext as 

E 0 =M-e(g,g) as , E 1 = (g a °g^) s , E 2 = g s . 

► Decrypt(C, Y, sk^, X, pk) : to decrypt, the algorithm computes the message 
blinding factor as e(Pl ^ E ^j 0 " ) " ' E2) = e(g,g) as . 

The selective security of this scheme is a consequence of a result given in |5i. 

Theorem 1. Construction® is selectively secure under the n-Decisional Bilin¬ 
ear Diffie-Hellman Exponent assumption (see |9j for a description of the latter). 

4.2 Adaptively Secure Zero IPE under Simple Assumptions 

We extend the above selectively secure zero IPE to acquire adaptive security 
by applying the Waters’ dual system method pTTi . However, we have to use our 
“n-equation technique” as opposed to 2-equation technique used for IBE in [27]. 
The reason is that we have to deal with the difficulties arising from the richer 
structure of IPE and the aggregation of ciphertexts into a constant number of 
elements, analogously to what we described in section |T] 

The scheme basically goes as follows. A ciphertext contains a random tag tagc 
in the element Ei while each key contains n — 1 tags (tagk^ for each Ki element) 
that are aggregated into tagk = 2 tag KlJi upon decryption of a ciphertext 

intended for Y. The receiver can decrypt if tagk ^ tagc (and X Y = 0). 

Construction 3. (Adaptively secure zero IPE) 

► Setup(l A ,n): chooses bilinear groups (G,Gt) of prime order p > 2 A . It then 
picks generators g,v,v\,v 2 G and chooses a, ao, ai,..., a n , a\, a 2 , b 4- Z p . 
Let a = (cti,..., a n ) and H = (hi ,..., h n ) = g a . The public key consists of 

k= ( 9 , w = g a °, Z = e(g, g) a ai ' b , H = g s , A x = g a \ A 2 = g a2 ,B = g b , \ 
p V Bi = g b ai , B 2 = g b a 2 , n = v ■ v a i\ 72 = V ■ v?, Ti = r b , T 2 = r 2 6 ) 
The master key is defined to be msk = (g a , g aai , v, V\, v 2 ). 
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► Keygen(X, msk, pk): parses X as (xi,... ,x n ) and returns A if Xj = 0. Oth¬ 
erwise, it picks r\, r 2 4- Z p , z\^z 2 ■£- Z p , tagk 2 ,..., tagk„ 2 p , sets r = r± + r 2 
and generates sk^= = (D 4 ,..., D 7 , K 2 ,..., I\ ni tagk 2 ,..., tagk„) by computing 

sk core = {A',; = (g~ a ^ -g ai •. 9 Qotagk ’) ri } l=2 ,..., n , 

D 1 =g aai ■ v r , D 2 = g~ a ■ v{ ■ g Zl , D 3 = B~ Zl , D 4 = v 2 ■ g Z2 \ 

D 5 = B ~ z2 , D 6 = B 1 " 2 , D 7 = g ri )' 

► Encrypt(Y) M, pk): to encrypt M £ Gt under Y — (t/i,..., y n ) £ (Z p )", pick 
si,S 2 ,t,tagc Z p and compute C = (Ci,..., C 7 , E 0 , E\, E 2l tagc) where 

C core = (Eo = M • Z S2 , Ex = (<? Q ° tagc ■ g {s ’ 9) )\ E 2 = g*), 

Ci = B 8l+S2 , C 2 = B [ 1 , C 3 = A* 1 , C 4 = B s 2 2 , 

C 5 = A s 2 2 , C 6 = rf 1 ■ r 2 2 , C 7 = Tf 1 • T * 2 • to - * 

► Decrypt(C, F, sk^, X, pk): computes tagk = tagk 2 ?/2 H-h tagk„y„ and then 

Wi = Yl 5 j= x e (Cj, D j) ■ = e (g,g) a ' ai ' bS2 ■ e(g,w) rit , as well 

as W 2 = y K2 e (E ^D 7 ) E2 ^ * aek * aSC = e (9i w ) rit - ^ finally recovers the plaintext as 
M = E 0 /Z S2 = £ 0 /e(s,s )“' Ql ' b ' S2 <- E 0 ■ W 2 ■ Wf 1 . 

The correctness of W 2 is shown in appendix IA.1I while the rest follows from 
[27] , As we can see, ciphertexts have the same size as in the IBE scheme of |27| , 
no matter how large the vector Y is. Also, decryption entails a constant number 
of pairing evaluations (whereas ciphertexts cost 0{n) pairings to decrypt in [18l). 

Theorem 2. Construction^ is adaptively secure under the DLIN and DBDH 
assumptions. 

Proof. The proof uses the dual system methodology similar to |27| . which in¬ 
volves ciphertexts and private keys that can be normal or semi-functional, 
o Semi-functional ciphertexts are generated by first computing a normal ci¬ 
phertext (Cq, C [,..., C' 7 , E[, E' r , tagc') and then choosing x before 

replacing (C 4 , C' 5 , C 6 , C 7 ), respectively, by 

C 4 = C 4 • g ba2X , C 5 = C' 5 ■ g a2X , C 6 = C' 6 -v 2 2X , C 7 = C' 7 ■ v 2 2bx . (4) 

o From a normal key (D [,..., D 7 , AT 2 ,..., I\’ nl tagk 2 ,..., tagk),), semi-functional 
keys are obtained by choosing 7 4- Z p and replacing ( D [, D 2 , D' 4 ) by 

D 1 =D[-g~ a 1027 , D 2 = D' 2 ■ 3 027 , D 4 = D\- 5 “ 17 . (5) 

The proof proceeds with a game sequence starting from Gam &R ea i, which is the 
actual attack game. The following games are defined below. 
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Gameo is the real attack game but the challenge ciphertext is semi-functional. 
Garnet (for 1 < k < q) is identical to Gameo except that the first i private key 
generation queries are answered by returning a semi-functional key. 
Game g+ i is as Game q but the challenge ciphertext is a semi-functional encryp¬ 
tion of a random element of G t instead of the actual plaintext. 

We prove the indistinguishability between two consecutive games under some 
assumptions. The sequence ends in Game g+ i, where the challenge ciphertext is 
independent of the challenger’s bit (3, hence any adversary has no advantage. □ 

The indistinguishability of Gam &Reai and Gameo as well as that of Game g and 
Game g _)_i can be proved exactly in the same way as in [27] and the details are 
given in the full version of the paper. 

Lemma 1. If DLIN is hard, Gameo is indistinguishable from Gam &R ea i- 

Lemma 2 . For any 1 < k < q, if an adversary A can distinguish Game^ from 
Gamefc_i , we can build a distinguisher for the DLIN problem. 

This lemma is the most non-trivial part in the theorem. The main issue is that, 
in order to enable adaptive security, the reduction must be done in such a way 
that the simulator B can create semi-functional keys for any vector X, including 
those for which X ■ Y* = 0. However, the crucial point is that we must prevent 
B from directly deciding whether the fc th queried private key is normal or semi¬ 
functional by generating a semi-functional ciphertext for itself. Indeed, if this 
were possible, the reduction from A would not be established. 

To resolve this, we use a secret exponent vector f £ Z p and embed the DLIN 
instance so that B can simulate only the key at k th query for X with tags 
(tagk 2 ,... , tagk„) and the challenge ciphertext for Y* with tagc* that obey the 
relation: (tagk 2 ,..., tagk„, tagc*) T = — M^ y t (, where M^ ^ is the nxn matrix 
defined in Eq.©. We thereby conceptually use the n-equation technique here. 
A particular consequence is that if we have X ■ Y* = 0 then it holds that 

tagk = ta g Kvt = Ci %-V* ~ CiS/»* = Ci • (~Vi) - Ci2/? = tagc*, 

i —2 i=2 Xl i =2 i —2 

which is the exact condition that hampers the decryption, thus B cannot test 
by itself, as desired. We are now ready to describe the proof of Lemma [ 2 ] 

Proof. The distinguisher B receives ( g , /, v, g 01 , / 02 , rf) and decides Hr] = u 6l+B2 . 
Setup. Algorithm B picks a, a ±, , 8 Vl , S V2 4- Z p and sets g = g, Z = e(/, g ) aai , 

Ai=g a \ A 2 = g a \ B = g b = f , v x = v^-g^ 1 

B x = g bai = f ai , B 2 = g ba2 = /“ 2 , v = v ~ aia2 , v 2 = v ai ■ g Sv2 , 

Tl = vv °i =g tv l °i ) T2 = vv ? r b = f Sv i ai , T b = f^ a2 . 

Next, B chooses 5 W A- Z p , f = (Ci,..., ( n ) A- Z”, 5 = (<5i,..., S n ) A- Z”, then 
defines w = g a ° = f ■ g Sw , and hi = g ai = f^* ■ g Si for i = 1,..., n. Note that, as 
in the proof of lemma 2 in [27] , B knows msk = (g a ,g aai ,v, Vi,v 2 ). 
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Key Queries. When A makes the j th private key query, B does as follows. 
[Case j > k] It generates a normal key, using the master secret key msk. 

[Case j < k] It creates a semi-functional key, which it can do using g 0l ° 2 . 
[Case j = k] It defines tagk 2 ,..., tagk n as tagk ; = Ci ' §7 ~ Ci for * = 2,..., n, 
which implies that (h 7 Xt ^ Xl ■ hi ■ iu tagki ) = g~ s iOi/®i)+< 5 «+<Utagk i> for i = 2 ,..., n. 
Using these tags, it generates a normal private key (D[, ..., D' r , K 2 ,.. ., K' n ) 
using random exponents r [, r 2 , z [, z 2 4- Z p . Then, it sets 

D ± = D[ • ? r aia2 , D 2 = D’ 2 - rf 2 • ( g 8l ) s ”1 , D 3 = D' z • (/ fl2 ) 4 «i, 

U 4 = J D;-?r-(/ 1 ) 5 " 2 , D 5 = D' 5 -(f 02 ) 5 ^, D 6 = D' 6 -f 02 , 

as well as D 7 = Dy • ( g 9l ) and /fo = K[■ (g 8 i)- < 5 iUi/®i)+' 5 t+G,tagk i for i = 2,..., n. 

If y = v 8l+82 , sk^ = (Hi,..., D 7 , ifo,..., A'„, tagk 2 ,..., tagk ?l ) is easily seen 
to form a normal key where rq = r[ + 0 1 , r 2 = r 2 + d 2 , z\ = — <5 Wl 02 , 

£2 = z 2 — 8 V2 9 2 are the underlying random exponents. If y Gr G, it can be written 
y = i > 9l + e2 ■ g"< for some 7 £r Z p , so that sk^ is distributed as a semi-functional 
key. We note that tagk 2 ,..., tagk ?l are independent and uniformly distributed 
since Ci, • • •, Cn (which are the solutions of a system of n — 1 equations with n 
unknowns) are uniformly random and perfectly hidden from A's view. 

Challenge. A outputs Mo, Mi £ G t along with a vector Y* = (y \,..., y*). 
B flips a coin (3 { 0 , 1 } and computes the tag tagc* = —(Y*,Q for which B 

will be able to prepare the semi-functional ciphertext. To this end, B first com¬ 
putes a normal encryption ( C' 0 ,C [,..., C 7 , E[,E 2 , tagc*) of Mg using exponents 
s}, s 2 , t'. It then chooses x 4- Z p and computes 

C 4 = C\- / a2X , C 5 =C' 5 -g a2 ' x , c 7 = C 7 ■ iy~ Sw ' ai ' a2 ' x ■ f Sv2 ' a2 ' x , 

Ce = C ' 6 ■ V 2 2 X , E 2 =E’ 2 - i/ ai ° 2 '*, E 1 = E[- ( z/ ^-tagc* + <f*/>)a 1 -a 2 - X _ 

We claim that (C' 0 , C{, C 2 , C 3 , C 4 , C 5 , Cq 7 C 7 , E\, E 2i tagc*) is a semi-functional 
ciphertext with underlying exponents X, si = s'i,S 2 = s 2 and t = t' + 
\og g (is)aia 2 x■ To prove this, we observe that 

C 7 = T 1 s1 • T 2 2 ■ w~* ■ v 2 2bx = T-f 1 ■ T 2 2 ■ to - *' - l °s g CAi^x . (fou . g s V 2 ^a 2 b x 
= Tp ■ T ° 2 ■ w- 1 ' ■ (/ • g 5w )~ Io s 9 U) a i°« . (j/u . g S V 2 y 2 b x 

— Q! . l/ -S w aia 2 x . jSv 2 a 2 \ 

where the unknown term in v 2 2bx is canceled out by w _t . Also, 

E X = E[ ■ (hf ■■■ht"- w tagc*)log 9 H ai a 2 x 

= E[ ■ ((f^g s ') y ! ■ ■ ■ (. f Cn 9 Sn ) y - ■ (/^)-<?*,C)) log 9 H—x 
= E[ ■ ( J yW*>^)+ , 5 «"' ta g c *) 0ia2 X 
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where the unknown f los aA) vanishes due to our definition of tagc*. It then re¬ 
mains to show that tagc*,tagk 2 ,... , tagk„ are still n-wise independent. But this 
holds since their relations form a system 


(-% 1 \ 

( Ci\ 

/tagk 2 \ 

11 

C 2 


tagk 3 

“If 1 



tagk n 

\ 2/1 2/2 2/3--- Vn/ 

VCnj 


\tag c*/ 


which has a solution in ( whenever det(M) = (— l) n+1 X ■ Y*/x\ ^ 0. 

Eventually, A outputs a bit (3' and B outputs 0 if (3 = (3'. As in [27], we see 
that A is playing Game^-i if r/ = v 8l+02 and Garner otherwise. □ 

Lemma 3. If DBDH is hard , Game g and Game 9 _)_i are indistinguishable. 

5 Functional Encryption for Non-zero Inner-Product 

5.1 Negated Spatial Encryption 

We begin this section by providing a co-selectively-secure construction of negated 
spatial encryption, which is motivated by its implication of non-zero IPE. At a 
high-level, our scheme can be viewed as a “negative” analogue of the Boneh- 
Hamburg spatial encryption [5], in very much the same way as the Lewko-Sahai- 
Waters revocation scheme [T5] is a negative analogue of the Boneh-Boyen IBE [1]. 
The intuition follows exactly from section [T| where we have to use “n-equation 
technique”. In spatial encryption, we have to deal with, in general, how we can 
set up a system of n equations similarly to Eq. (|T|) . To this end, we confine 
the vector subspaces that we can use as follows. Our construction is a FE for 
I?Neg(Spatiai) . x —j. {0,1}, where we define a collection W n C V„ of vector 
subspaces in as W n = {Aff(M, 0) £ V„ | rank(M(_i)) = n — 1}, where we 
denote M(_!) as the matrix obtained by deleting the first row Mi £ Zp Xd of M. 

Construction 4. (Co-selectively secure negated spatial encryption) 

► Setup(l\ n): chooses a bilinear group <G of prime order p > 2 X with a random 
generator g 4- (&. It randomly chooses a, a\ ,..., a n 4- Z p . Let a = (or,..., a n ). 
The public key is pk = (g,g a ,g aia ,e(g,g) a ). The master key is msk = (a, a). 

► KeyGen(F, msk, pk): suppose that V = Aff(M, 0), from a matrix M £ (Z p ) nxd . 
The algorithm picks t 4- Z p and outputs sky = (Ho, D\,K) £ <& d+2 where 

A) = g\ Di=g a+ta 1 , K = g tMTs . 

► Encrypt(y, M, pk): picks AZ p and computes (Co, Ci, C 2 , C 3 ) as 

C 0 = M-e(g,g) as , Ci=g sa ^’ S \ C 2 = g s , C 3 = g a ' s . 
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► Decrypt(C, y, sky, V, pk): the algorithm first obtains M from V. We also recall 
the the notation of Mi, which is the vector of the first row of M. It first solves 
the system of equations in w from Mr—i)W = (y 2 , ■ • •, y ra ) T , which it can do since 
V G W n , It computes the message blinding factor e{g,g) as as 


e(D u C 2 ) 


( e(Ci,Dp) \ ^ 
{e(K™,C 3 )J 


= e(g a+ta i 


9 s ) 


( e ( g »ai<g,a> ig t) \ mJ - v1 

y g tw T M T S g a 1 s'j ) 


Computability. We claim that the decryption can be computed if y fL V. 
Indeed, we prove that if y ^ V then M\ w — yi Y 0 (and the above equation is 
well-defined). To prove the contrapositive, suppose that M\w — yi = 0. Then, 
we must have y G V since Mw = m J ^ = ^' 

Correctness. We verify that decryption is correct as follows. First, we note 
that due to our definition of w , we have ( Mw — y , a) = (Miw — yi)ot\. Therefore, 
the correctness follows from the fact that 

/ e(gSai (g,5) ;g t) 

\e(g^ TMTS ,g^ s )J 

Theorem 3. Construction [7] is co-selectively secure under the q-Decisional 
Multi-Exponent Bilinear Diffie-Heilman assumption (q is the number of key 
queries). (The proof is given in the full paper where the assumption |19] is 
also recalled). 

Implications. For a vector X G Z”, the embedding = Aff(M^,0 n ) defined 
in Eq.(© is easily seen to be in the limited domain W n since (M^)lj) is an 
identity matrix of size n — 1 and hence rank((.M^)(_i)) = n— 1. Therefore, from 
Corollary [T| the above scheme implies non-zero IPE. 


e(5,s) ts “ 1<M “ T-5 ’' 


= e(g,gY 


5.2 Non-zero IPE under Simple Assumptions 

We prove the co-selective security of our negated spatial encryption scheme under 
a non-standard q -type assumption introduced in m- Here, we show that the 
dual system technique [27] makes it possible to rest on simple assumptions such 
as DBDH and DLIN. The scheme is very similar to the zero IPE scheme of 
section l~i~21 and we only state the differences. The intuition again follows exactly 
from section [l] and the security proof uses similar techniques as in [©3 ■ 

Construction 5. (Co-selectively secure non-zero IPE) 

► Setup(l\ n): outputs pk exactly as in the construction [5] except that we define 
w = g ai (= hi) in this scheme, instead of g a °. 

► Keygen(X, msk, pk): outputs sk^ = (sk a d ap t, sk core ) where sk a d apt is the same as 
in the construction[3](with w = g ai ) and sk core = {Af = ( g~ ai ** ■g°‘ i )' 1 }i- 2 ,...,n- 

► Encrypt(T, M, pk): outputs C = (C a d ap t, C CO re) where C a d ap t is as in the con¬ 
struction 0 (with w = g ai ) and C core = (E 0 = M -Z S2 , E 1 = E 2 = g 1 )- 










Functional Encryption for Inner Product 399 


► Decrypt(C', Y, sk^, X, pk): computes W\ as in the construction [ 3 ] arid W2 as 

W2 = ^ K2 e (EuD 7 ) E2 ^ ) X 9 = e (9> w ) rit ■ (See appendix IA.21) . 

Theorem 4 . Construction^ is co-selectively secure under the DLIN and DBDH 
assumptions. (The proof is deferred to the full version of the paper.) 


5.3 A Generalization of the Scheme and Its Application 

Extended Ciphertext Attribute Domain. The above scheme for the rela¬ 
tion f? NIPE " ; Z” x Zp -4 { 0 ,1} can be extended so as to support relations of 
the form f? NIPE " : Z” x (Z™) d —*■ {0,1}, for some d G poly(A), and defined as 
tf NIPE ;(X,(Fi,...,y d )) = 1 iff for all* = l,...,d: X ■% ^ 0. 

We construct this extended system by setting up exactly the same public 
and private keys (for X ) as in the original scheme. To encrypt to (Yi,... ,Yj), 
the scheme generates Co,..., C 7 as usual with the underlying exponents si, S 2 , t. 
Then, it chooses t±, ..., td G Z p so that t = t± + - ■ -+td and for i = 1,..., d, parses 
Yi = ■ ■ ■ ,yi, n ) and computes Ei ti = (g( a Xi))U = (h^’ 1 ■ ■ ■ hn' n ) ti and 

E 2 ,i = g u , in such a way that the ciphertext is (Co,..., C 7 , £ 2 ,..,d). 

Decryption requires to first compute 


W 2 ,i = 


e{K v 2 


■ K;'r. C,.,) 


e(Eij, Dr) 


= e(g, w) riti 


for i = 1,..., d, from which the receiver obtains W2 = W24 • • • W24 = e(g, w) rit . 
The rest is then done as usual and we explain in the full version of the paper 
how the security proof must be adapted. 


Applications. We can obtain an identity-based revocation scheme with param¬ 
eter tradeoff from the aforementioned extension. The instantiation of ID-based 
revocation scheme (IBR< n ) from our non-zero inner-product system l\IIPE„ + i 
yields a construction with 0 (l)-size ciphertexts and 0 (n)-size private keys, 
where n denotes the maximal number of revoked users. 

From our extended scheme NIPE* +1 , we can obtain an ID-based revocation 
scheme IBR po | y (^), without a fixed maximal number of revoked users. To revoke 
the set R where \R\ = r, we divide it into a disjointed union R = R\ U • • • U R r / n , 
where |I?i| = n for all i (we assume that n divides r). We then simply construct 
the vector V) from the revocation subset Ri for each i G [l,r/n], in the same 
way as we use NIPE„_)_i to instantiate IBR< n . We then finally encrypt using the 
set of vectors (Yi,... ,Y r / n ). The correctness and security properties hold since 
j^ |B R<n (|D, R) = 1 <£=> j£ IBR P oiy(>o (|D, (Ri ,..., R r / n )) = 1. The construction has 
0{r/n )-size ciphertexts and 0(n )~size private keys. Interestingly, we note that 
the second scheme described by Lewko, Sahai and Waters [H] (which indeed 
inspires ours) can be viewed as a special case of our scheme where n = 1 . 
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A Verifying Correctness in Decryption 

A.l For the Zero IPE Scheme of Section 14.21 



(e(l& 2 {g- ai %g a 'w»**r y ',g t ) 

\ • w tagc )\s( ri ) 


tagk-tagc 
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A.2 For the Non-zero IPE Scheme of Section 15.21 


W 2 = 


_ ( e (rnL 2 (g g a< ) riw . 


e(Ei,Dt) 


e {ig 


*iyiH- \-a n yn\ n ri 


;((t 


qOL2V2-\ - \-a n y, 


') ri ,s‘) 


e{[yjy 1 ■ 3 “2S2+-+a„ ! /„j ( i5 rij 
/ x-y 

= e(w X1 ,sj = e(3,w) rit . 
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Abstract. Liskov proposed several weakened versions of the random oracle 
model, called weakened random oracle models (WROMs), to capture the vulner¬ 
ability of ideal compression functions, which are expected to have the standard 
security of hash functions, i.e., collision resistance, second-preimage resistance, 
and one-wayness properties. The WROMs offer additional oracles to break such 
properties of the random oracle. In this paper, we investigate whether public-key 
encryption schemes in the random oracle model essentially require the standard se¬ 
curity of hash functions by the WROMs. In particular, we deal with four WROMs 
associated with the standard security of hash functions; the standard, collision trac¬ 
table, second-preimage tractable, first-preimage tractable ones (ROM, CT-ROM, 
SPT-ROM, and FPT-ROM, respectively), done by Numayama et al. for digital sig¬ 
nature schemes in the WROMs. We obtain the following results: (1) The OAEP is 
secure in all the four models. (2) The encryption schemes obtained by the Fujisaki- 
Okamoto conversion (FO) are secure in the SPT-ROM. However, some encryption 
schemes with FO are insecure in the FPT - ROM. (3) We consider two artificial vari¬ 
ants wFO and dFO of FO for separation of the WROM s in the context of encryption 
schemes. The encryption schemes with wFO (dFO, respectively) are secure in the 
CT-ROM (ROM, respectively). However, some encryption schemes obtained by 
wFO (dFO, respectively) are insecure in the SPT-ROM (CT-ROM, respectively). 
These results imply that standard encryption schemes such as the OAEP and FO- 
based one do not always require the standard security of hash functions. Moreover, 
in order to make our security proofs complete, we construct an efficient sampling 
algorithm for the binomial distribution with exponentially large parameters, which 
was left open in Numayama et al.’s paper. 

Keywords: public-key encryption schemes, weakened random oracle models, 
OAEP, Fujisaki-Okamoto conversion. 


1 Introduction 

Background. In order to design new cryptographic schemes, we often follow the ran¬ 
dom oracle methodology m. First, we analyze the security of cryptographic schemes, 
by idealizing hash functions as truly random functions called the random oracle. When 
it comes to implementations of these schemes, we replace the random oracles by cryp¬ 
tographic hash functions such as MD5 Q and SHA-1 f3}. This replacement is called 
an instantiation of the random oracle. 

P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010. LNCS 6056, pp. 403 |-419,| 2010. 
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The random oracle methodology causes a trade-off between efficiency and provable 
security. The schemes proven secure in the random oracle model (ROM) are in general 
more efficient than those proven secure in the standard model. However, the security 
proofs in the ROM do not directly guarantee the security in the standard model, i.e., 
an instantiation of the random oracle might make the cryptographic schemes insecure. 
Even worse, several recent works j4l5l6l showed that some schemes secure in the ROM 
have no secure instantiation. 

There are several properties of the ROM to prove the security of cryptographic prop¬ 
erties. In particular, the ROM is expected to satisfy the one-wayness, second-preimage 
resistance, and collision resistance properties. We call these properties as the standard 
security of hash functions. These properties are indeed critical in many schemes for 
their security proofs. For example, the security of the Full-Domain-Hash (FDH) signa¬ 
ture schemes (e.g., Q), which are secure in the ROM, relies on the collision-resistance 
property of the ROM. That is, if we can obtain two distinct messages m, m' such that 
H(m) = ll(m') and the signature cr = S\g( f Kin)), then we can obtain a valid forgery 
{m! , cr), where H is a hash function and Sig is a signing algorithm. Leurent and Nguyen 
also presented the attacks extracting the secret keys on several hash-then-sign type sig¬ 
nature schemes and identity-based encryption schemes if the underlying hash functions 
are not collision resistant |8|. 

Recent progress on the attacks against cryptographic hash functions such as MD5 
and SHA-1 raises the question on the assumption that hash functions are collision re¬ 
sistant and one-way (e.g. . 1911011 11 ). Therefore, it is significant to investigate whether 
the collision resistance property (as well as the one-wayness and second-preimage re¬ 
sistance properties, which are weaker notions than the collision resistance one) of the 
ROM is essential to prove the security of the schemes or not. More generally, it is worth 
classifying the schemes by the first-preimage, second-preimage, and collision resistance 
properties of the ROM that their security essentially requires. 

Weak versions of random oracle models. Several works recently highlighted some spe¬ 
cific properties of the ROM for secure cryptographic constructions in the ROM. 

Nielsen proposed the non-programmable random oracle model where the random or¬ 
acle is not programmable im In this model, one cannot set the values that the random 
oracle answers to some convenient values. It was showed in OH that a non-interactive 
non-committing encryption scheme exists in the ROM (assuming that trapdoor permu¬ 
tations exists), but not in the non-programmable random oracle model. 

Unruh proposed a ROM with oracle-dependent auxiliary inputs |[ 13i l. In this setting, 
adversaries obtain an auxiliary input that contains information with respect to the ran¬ 
dom oracle (e.g. collisions). He showed that the RSA-OAEP encryption scheme lfl4ll is 
secure in the ROM even under the presence of oracle-dependent auxiliary inputs. 

Liskov proposed several weakened versions of the random oracle model, called 
weakened random oracle models (WROMs), which offer additional oracles to break 
some properties of the random oracle lfT5l . These model captures the situation that 
adversaries are given an attack algorithm for breaking some specific property of the 
functions. For example, the first-preimage tractable random oracle model offers the 
random oracle and the first-preimage oracle associated with the random oracle, which 
returns a first-preimage of the random oracle to adversaries. This first-preimage oracle 
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then corresponds to the attack to the first preimage property of a hash function. We 
can replace the additional oracle to others such as the second-preimage and collision 
ones that correspond to the attack to the properties. Thus, the WROMs can capture vul¬ 
nerability of hash functions even if the parties are allowed to utilize ideal ones as in 
the ROM. By using WROMs, Liskov constructed hash functions based on weak ideal 
compression functions and proved it is indifferentiable from the random oracle. 

Several results already analyzed the security in the WROMs. Hoch and Shamir ap¬ 
plied Liskov’s idea to prove the indifferentiability of another hash construction ce 
P asini and Vaudenay also applied Liskov’s idea to the security analysis of digital sig¬ 
nature schemes fT71 . They considered the security of hash-then-sign type signature 
schemes in the first-preimage tractable random oracle model. Numayama, Isshiki, and 
Tanaka formalized the WROMs, which allows us to formally analyze the security of the 
schemes lfl8l . By using these models, they classified several digital signature schemes 
by the properties of the ROM. Fischlin and Lehmann also proposed a weakened random 
oracle model in a similar way to Liskov’s one in the context of secure combiners |fl9| . 

Our contributions. In this paper, we investigate whether public-key encryption schemes 
constructed in the ROM essentially require the standard security of hash functions by 
further extending the direction originated from Liskov. In particular, we consider their se¬ 
curity in the standard, collision tractable, second-preimage tractable, and first-preimage 
tractable random oracle models (ROM, CT-ROM, SPT-ROM, and FPT-ROM, respec¬ 
tively for short). Note that they are ordered according to their strengths, i.e., the security 
of encryption schemes in the FPT-ROM implies that in the SPT-ROM and such impli¬ 
cations hold between each adjacent two models. 

We demonstrate that the security notions in the four WROMs can be strictly separated 
in the context of encryption schemes. For the separation, we focus on the security of 
the encryption schemes obtained by the Fujisaki-Okamoto conversion (FO) |[20ll . its two 
artificial variants (dFO and wFO), and the OAEP Ifl4l . Precisely, we prove the following 
four statements: 

1. OAEP is IND-CCA2 secure in the FPT-ROM. 

2. FO is IND-CCA2 secure in the SPT-ROM, but not IND-CPA secure in the 

FPT-ROM. 

3. wFO is IND-CCA2 secure in the CT-ROM, but not IND-CCA2 secure in the 
SPT-ROM. 

4. dFO is IND-CCA2 secure in the ROM, but not IND-CCA2 secure in the CT-ROM. 
We summarize the security of four schemes in Table Q] 


Table 1. Security of four schemes 


scheme/model 

ROM |CT-ROM|SPT-ROM|FPT-ROM 

OAEP 

secure 

FO 

secure | insecure 

wFO 

secure | insecure 

dFO 

secure | insecure 
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This separation suggests that some public-key encryption schemes essentially re¬ 
quire the standard security of hash functions. These notions were also separated in the 
context of digital signature schemes in ED- We stress that the role of the collision and 
second-preimage oracles in encryption schemes is not as clear as that in digital signature 
schemes. For example, it is easy to see that the collision oracle, breaking the collision 
resistance property of the random oracle, directly makes a simple scheme vulnerable, 
but not so easy for the case of encryption schemes. Actually, we need to develop new 
proof techniques for the (in)security of encryption schemes under additional oracles. 

It also suggests that standard encryption schemes such as the OAEP and FO-based 
ones do not always require the standard security of hash functions for the random oracle. 
We believe that our results do not only give an example of the first application of the 
WROMs to encryption schemes, but they are also of independent interest. As far as we 
know, our results give the first evidence that the OAEP encryption scheme can be used 
in a practical application even without the first-preimage resistance property, i.e., the 
one-wayness property. In other words, the OAEP remains secure even if we remove 
the first-preimage resistance property. This can also be said on FO-based encryption 
schemes on the second-preimage resistance property. 

On the security of the OAEP, Kiltz and Pietrzak recently showed that there is no con¬ 
struction for padding-based encryption schemes including the OAEP that has a black¬ 
box reduction from ideal trapdoor permutations to its IND-CCA2 security in Ell . How¬ 
ever, they wrote in the paper that the security proof in the ROM can be still a valid 
argument in practice. We believe so is our security proof in the WROMs. 

For the security proof, we explicitly show how to sample approximately in polyno¬ 
mial time from binomial distributions with exponentially large parameters, that is, a 
polynomial-time sampling algorithm whose output distribution is statistically close to 
the binomial distribution. For this algorithm, we arrange and combine sampling algo¬ 
rithms that run over real numbers proposed in the field of statistics B22I23I24I25I . and 
give a precise analysis for discretization. 

It should be noted that on the security proofs of the digital signature schemes in the 
WROMs ||T8l . Numayama et al. assumed such an efficient sampling algorithm and thus 
gave no explicit construction. They left the construction of the sampling algorithm as 
an open problem. By the sampling algorithm we explicitly show, it is no longer neces¬ 
sary to assume the sampling algorithm in their security proofs of the digital signature 
schemes lfT8ll as well as those of the public-key encryption scheme in this paper. 

The sampling algorithm shown in this paper is adapted for cryptographic use since 
the statistical closeness to the original distribution is measured by the total variation 
distance, which is standard in cryptography but not usually required in statistics. The 
sampling algorithm is useful for other cryptographic tasks as in Numayama et al.’s and 
this paper. 

Comparisons with other models. As mentioned above, a few models that weaken the 
power of the random oracle were already proposed such as the non-programmable 
model H~2l and the oracle-dependent auxiliary input model fl3l . 

The non-programmable model is not simply comparable with WROMs since the 
programmability does not imply the collision resistance and vice versa. The target of 
the oracle-dependent auxiliary input model partially overlaps that of the WROMs. 
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For a simple comparison, we now focus on the security of the OAEP in both 
models. Unruh showed a similar result as ours for the OAEP encryption scheme llT3l. He 
proposed a random oracle model where oracle-dependent auxiliary inputs are allowed. 
In his setting, the adversary of some cryptographic protocol obtains an auxiliary input 
that contains the information (e.g., collisions) on the random oracle. He showed that 
the OAEP encryption scheme DU is still secure in the random oracle model even in his 
model. This result indicates an important fact that the security of the OAEP encryption 
scheme does not depend on the collision resistance property since the oracle-dependent 
auxiliary input can contain a sufficiently long list of collisions. 

Our results also present the security of the OAEP in a weak version of the random 
oracle. However, there are at least two differences between Unruh’s result and ours. 
First, the random oracle model with the oracle-dependent auxiliary input does not com¬ 
pletely capture the adaptive security of hash functions, and this model still has the 
second-preimage resistance and the first-preimage resistance properties. Hence, only 
by his result, we cannot say whether these two properties are necessary or not in order 
to prove the security of the OAEP encryption scheme. In contrast to Unruh’s result, 
our result clearly shows that the two adaptive securities of hash functions such as the 
first-preimage resistance and the second-preimage resistance are not necessary to prove 
the security of the OAEP encryption scheme. 

Second, Unruh constructed the reduction algorithm which breaks the partial-domain 
one-wayness of the underlying trapdoor permutation using the adversary which breaks 
the IND-CCA 2 security of the OAEP encryption scheme. The running time of the re¬ 
duction algorithm is not bounded by any polynomial. Therefore, he use the security 
amplification technique for the partial-domain one-wayness. By using this technique, 
he can avoid employing a stronger assumption that even quasi-polynomial time adver¬ 
sary cannot break the partial-domain one-wayness, and can prove the security under the 
standard partial-domain one-wayness against polynomial-time adversary. 

In contrast to Unruh’s result, we construct the polynomial-time reduction algorithm 
using the adversary, and hence we do not require the security amplification technique 
for the partial-domain one-wayness, which can be considered as a simplification of 
Unruh’s proof. 

Organization. In Section[2l we describe the details of the WROMs and their properties. 
We also discuss the simulation methods that are applicable to these models. In Section[2 
after reviewing the encryption schemes we consider, we show their (in)security in the 
WROMs. Many technical details will be omitted from this extended abstract. We will 
describe them in the full version [I26J. 

Notation. Before starting technical parts of this paper, we introduce our notation used 
in the rest of the paper. For a table T = {(x, y)}, we define T(y) = {(x',y') e T | y' = y}. 
For a distribution D, x <— D denotes that x is sampled according to D. The function 
D(x ) stands for the probability function of the distribution D. 

Let s <— S denote that s is sampled from the uniform distribution over a finite set S. 
#S denotes the number of elements in S. For a probabilistic Turing machine and its 
input x, let dR{x) denote the output distribution of Ch on input x. 

We usually denote by k a security parameter of a cryptographic scheme in this paper. 
We also denote by Id length of plaintexts unless it is specified. Id is implicitly assumed 
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to be polynomially related to the security parameter k, that is, k' = k 6t 11 . We say a 
function f(k) is negligible in k if f(k) < 2 _ " (log<r) . For two distributions D\ and Z >2 over 
a finte set S, we denote the statistical distance (the total variation distance) between 
them by A{D\,Di), defined by 5 YseS |Z>i(s) - /L(.y)j. We say two distributions l)\ and 
Di are statistically close if zf(Z>i, Df) < 2~ <j(logA ' ) . 

2 The Weakened Random Oracle Models 

In this section, we first review the definitions of the WROMs. Next, we present an 
important property called weak uniformity of the WROMs, which is useful for security 
proofs of encryption schemes. We also discuss the simulation methods of [|T 8 l used for 
the security proofs in the WROMs. 

2.1 Definitions of the Weakened Random Oracle Models 

To give formal definitions of the WROMs, we define some notation. Let X and Y be 
finite sets. Let H be a hash function chosen randomly from all of the functions from X 
to Y. We denote by T h the table {(x, H ( x )) \ x e X}. We identify the hash function H 
with the table T h- 

We next define the random oracle and the additional oracles associated with H : X —> 
Y as follows. (For more details, see fl 8 l .) 

Random oracle RO H : Given x, return y such that (x,y) e T H . 

Collision oracle CO H : On the query, first pick one entry (x, y) e T h uniformly at ran¬ 
dom. If there is no other entry (x',y) e T#, then answer ±. Otherwise, pick one 
entry (x!, y) e T h satisfying x A x’ uniformly at random and answer (x, x'). 
Second-preimage oracle SVO H : Given (x, y), if (x, y) £ Th answer _L. If there is no 
other entry (x',y) e T#, then answer ±. Otherwise, pick one entry (x', y) e T h 
satisfying x + xf uniformly at random and answer xf. 

First-preimage oracle T f’O" ■ Given y, if there is any entry (x,y) 6 T h then return 
such an x uniformly at random. Otherwise return _L. 

Remark 1. We usually identify the random oracle and the underlying hash function. 
However, in this paper as in liT 8 l . we explicitly distinguish them by regarding the ran¬ 
dom oracle as an interface to the underlying hash function. This setting helps us to make 
the WROMs with an additional oracle well-defined. 

The formal definitions of the WROMs are given as follows. The WROMs consist of 
three components, a hash function h chosen randomly from all of the functions from 
X to Y, the random oracle, and the additional oracle associated with h. The models are 
called the CT-ROM, SPT-ROM, and FPT-ROM, if the additional oracle is the collision, 
second-preimage, and first-preimage oracle, respectively. 

Remark 2. The collision oracle may output ± even if there exists a collision (x, x') 
in the table. This stems from the simulation method of Numayama et al. lfT8l , and 
causes no serious problems. Note that the collision oracle outputs _L with probability 
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(1 - 1 /#Y) #X 1 . In the case where #X > #Y, we can find a collision with polynomially 
many queries since since (1 - 1 /#Y) #X ~ 1 < exp(-(#X - 1 )/#Y). In the case where 
#Y = k° m ■ #X, we can again find a collision with polynomially many queries (1 - 
1 l#Y) #x - 1 <1 — 1 /k 0(l \ Finally, in the case where #Y = k' J>< - v> ■ #X, the following lemma 
shows that there are no collisions with overwhelming probability. 

Lemma 1. Let H : X —> Y be the hash function, and n y the number of preimages ofy 
under the function H, that is, n y = #Tn(y). Let BAD denote the event that there is some 
y such that n y > L. Then for all sufficiently large Y, we have Pr//[ BAD] < where 
L=$g&§if#X>#Y,orL =§&§ otherwise. 

The proof is obtained by the standard argument on the balls and bins game by regarding 
X and Y as sets of balls and bins, respectively. For the details on the game, see a standard 
textbook (e.g., ED)- 

2.2 Difference from the Random Oracle Model 

We observe an important difference between the ROM and WROMs by considering the 
ROM and FPT-ROM. In the both models, the function H, i.e., the table T w is uniformly 
distributed. 

In the ROM, if one queries some x that has never been queried to the random oracle, 
the value of H(x) is uniformly distributed regardless of the past queries. That is, the 
knowledge of the past queries does not affect the entries not queried in the table. This 
property of the ROM is called uniformity. In contrast to the situation in the ROM, when 
it comes to the FPT-ROM, this property is not attained. Recall that the first-preimage 
oracle uniformly returns one of the preimages, say x, of queried value y. If the first- 
preimage oracle leaks a number of preimages of y, the value of H(x) is not uniformly 
distributed for an x not queried yet. 

In order to observe this situation, let us consider the following extreme case. Let 
y* = H(x*) for some x* e X and suppose that y* has the unique preimage x*. Then the 
first-preimage oracle always returns the same x* on the input y* , which convinces us that 
the number of the preimages of y* is exactly 1. This implies that the other x + x* does 
not take a value y* under //. Therefore, the random oracle no longer has the uniformity 
in the FPT-ROM. This is a critical difference between the ROM and FPT-ROM since 
we often make use of the uniformity in the security proofs of the public-key encryption 
schemes. 

We prove the following lemma to overcome this barrier in the WROMs, which states 
that the WROMs still has weak uniformity instead of the uniformity. The weak unifor¬ 
mity is still useful for the security proofs of the public-key encryption schemes in the 

WROMs. 

Lemma 2 (Weak Uniformity). In the WROM.v, the output distribution of the random 
oracle is statistically close to the uniform distribution. More formally, it is stated as 
follows. Let H : X —> Y be the hash function in the WROMs. Let Jibe a probabilistic 
oracle Turing machine that makes at most q queries to the random oracle < RO H and 
the additional oracle 0 H , where 0 H represents one of the additional oracles CO H , 
SVO H , and TPO 11 . V^hIx) denotes the random variable that represents the hash value 
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i RO h (x), where x <— and the correspondence (x,H(x)) e T fi is not answered 

by the two oracles. 

Then, for any 3\, the following holds: 

A(V&'H(x), Uy) < 

Here, the probability is taken over random choices of the hash function H and the 
random coin ofjl. 

2.3 Simulation Methods 

In almost all the security proofs in the ROM, the reduction algorithms simulate the 
random oracles. When it comes to the security proofs in the WROMs, the reduction 
algorithms have to simulate both the random and the additional oracle, which makes 
differences of the simulation methods in the WROMs from those in the ROM. 

Numayama et al. ’s methods. Numayama et al. proposed the simulation methods for 
WROMs, but they required an unproven assumption. Let Bn, p denote the binomial dis¬ 
tribution with parameters N and p whose probability function is Bn, p (x) = p x ( 1 - 
p) N ~ x for x = 0,..., N, where the parameters N and p take values approximately #X 
and 1 /#Y for a hash function H : X —» Y, say, (N , p) = (2 128 ,2 l28 ). Their simulation 
methods required the efficient sampler for B Np with exponentially large N and small p, 
and they assumed its existence. 

Assumption 1. There is a probabilistic Turing machine Bn such that the output distri¬ 
bution Bn (N, p) on inputs N and p is equal to the binomial distribution Bn p and it runs 
in polynomial time in log N and log p~ , where N is a positive integer and 0 < p < 1 is 
a rational number. 

Under this assumption, they constructed the simulation algorithms, RO, CO, SPO, and 
FPO, for the security proofs in the WROMs as given in the following proposition. 
See flTBl for the details of the algorithms. 

Proposition 1 (Simulation Method fl8l ). We can perfectly simulate the random or¬ 
acle, the collision oracle, second-preimage oracle, and first-preimage oracle in the 
WROMs under Assumption [7] That is, the output distributions of the random oracle, 
collision oracle, second-preimage oracle, and first-preimage oracle in the WROMs are 
identical to the output distributions of the algorithms RO, CO, SPO, and FPO, under 
Assumption [7] 

Removing the assumption. For the security proof in the WROMs of digital signature 
schemes in lfl8ll and encryption schemes in this paper, it is sufficient to utilize a weaker 
sampling algorithm that generates a distribution not equal but statistically close to the 
binomial distribution IH p . Then, their security proofs can work by just adding negligi¬ 
bly small errors induced by the statistical distance in their analyses. 


^( 5 ?+l + S + 209iS| 

£59+1 + ^+209^) if#X<#Y. 
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There are quite many papers (e.g., J25l) on the efficient sampling methods from the 
binomial distribution in the field of statistics. However, their basic computation model is 
totally different from the model in the cryptography. As far as the authors’ knowledge, 
all these results are based on the computation model that directly manipulates real num¬ 
bers without errors. If we translate them to those in the bit computation model used in 
the cryptography, we have to bound the statistical distance between the real distribution 
and the output distribution generated by the sampling algorithms in the bit computation 
model rather than the real-number one. Numayama et al. mentioned that they could 
neither find precise analyses of the statistical distance, nor construct the sampling algo¬ 
rithms by themselves in liT8l . Therefore, they had to put the above assumption. 

In fact, there is an efficient sampling algorithm appropriate for our purpose in the 
real-number computation model If25ll . We modify the algorithm and rigorously analyze 
the error bound in the bit computation model. We can finally obtain the following the¬ 
orem on the sampling algorithm. 

Theorem 1. There is a probabilistic Turing machine Bn such that, for the output dis¬ 
tribution Bn [N, p, e) on inputs N, p and e, the statistical distance between Bn (N, p, e) 
and Bn, p is at most e and it runs in polynomial time in log N, log p~ l and log e _1 , where 
N is a positive integer and ()</?< 1,0 < e < I are rational numbers. 

Note that the algorithm can control the error parameter e. This property is useful in 
cryptographic applications for the security proofs even if the other parameters N and p 
are not sufficiently large. We will put the details of the algorithm and its analysis in the 
full version. 

As a result, we can remove the above assumption and obtain the following theorem. 

Theorem 2 (Simulation Method without Assumption [lj. We can statistically simu¬ 
late the random oracle, collision oracle, second-preimage oracle, and first-preimage 
oracle in the WROM.s. That is, the output distributions of the oracles in the WROMs 
are statistically close to the output distributions of the algorithms RO, CO, SPO, and 
FPO, respectively. 

3 The Encryption Schemes and Their Security in the Weakened 
Random Oracle Models 

In this section, we examine the security in the WROMs of the public-key encryption 
schemes. We particularly discuss separations for notions of ROM, CT-ROM, SPT-ROM, 
and FPT-ROM by showing (in)security of public-key encryption schemes obtained by 
the Fujisaki-Okamoto conversion (FO) and its two variants (dFO and wFO), and OAEP. 

Public-key encryption schemes. We first give notation and notions for public-key en¬ 
cryption schemes briefly. For details, see standard textbooks, e.g., |[28l . 

A public-key encryption scheme = (Gen, Enc, Dec) over a plaintext space Al 
and a random coin space 'R is defined by the following three algorithms. Let k denote 
the security parameter. 

Key Generation: On input 1*, the key generation algorithm Gen( 1* ) produces a pub¬ 
lic/secret key pair (pk, sk). 
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Encryption: Given a public key pk, a plaintext m e M, and a random string r e 'R, 
the encryption algorithm Enc pk (m; r) outputs a ciphertext c corresponding to the 
plaintext m. 

Decryption: Given a secret key sk and ciphertext c, the decryption algorithm Dec S k(c) 
outputs the plaintext m 6 Al or the special symbol _L <7 M corresponding to the 
ciphertext c. 

We require the perfect completeness, that is, for every (pk, sk) generated by GenG^), 
every plaintext m e Al, and every random string r e R, it should be satisfied that 
DeCsk(Enc pk (m; r)) = m. 

We only consider three standard security notions for public-key encryption schemes, 
the one-wayness against chosen-plaintext attack (OW-CPA), the indistinguishability 
against chosen-plaintext attack (IND-CPA), and the indistinguishability against adap¬ 
tive chosen-ciphertext attack (IND-CCA2). 

For y = y(k), we say 'PKS is y-uniform if for any key pair (pk, sk) generated by 
Gen(l A ), any m e Af, and c e {0,1)*, we have Pr,.^[c = Enc P k(m;r)] < y. There 
exists a OW-CPA public-key encryption scheme with y-uniformity (e.g., the ElGamal 
encryption scheme). 

Brief review for FO. Fujisaki and Okamoto proposed a conversion, called the Fujisaki- 
Okamoto (FO) conversion, to obtain highly secure public-key encryption schemes in 
the ROM Ii20ll . Since the standard one-time pad satisfies the requirement of the FO 
conversion, we fix the one-time pad as the symmetric-key encryption scheme used in 
the FO conversion for simplicity. 

Let T*KB be a OW-CPA secure and y-uniform public-key encryption scheme over a 
plaintext space M and a randomness space R. Then the FO conversion converts VKti 
to an IND-CCA2 secure one VKEl = FO fP’KG) over a plaintext space AT = {0, I } k ' 
and a randomness space R! = AT where k' denotes the length of plaintexts, which is 
polynomially related to the security parameter k. The encryption procedure of VKti' is 
given as follows: For a plaintext m e AT = {0,1)* and a random string r e R' — Af, 
the ciphertext is 

(ci, cf) = (Enc P k(r; H{m, r)), G(r ) ® m), 

where H : {0,1}*' x Af —> 'R and G : M — > {0, I } k ‘ are hash functions modeled as the 
random oracles. The decryption procedure is given as follows: For a given ciphertext 
(ci, cf), decrypt c\ by sk and obtain r. Then, extract rn by C 2 © G(r) and verify c i = 
EnCp k (r; Him, r)). If not output _L. Roughly speaking, H(m, r) ensures that if a ciphertext 
(ci, cf) is valid then the encryptor producing (ci, C 2 ) knows corresponding m and r. 

3.1 The First Variant dFO 

We introduce the first artificial variant dFO and show that dFO is secure in the ROM, 
but not secure in general in the CT-ROM. 

The variant dFO converts a public-key encryption scheme f y Kti (with the one-time 
pad) to another public-key encryption scheme V'KEl = dFO(!P7C£) similarly to FO. 
The encryption procedure of VKE! is defined as follows. For a plaintext m e AT = 
{0,1 }* and a random string r e A' = Al, the ciphertext of VKE! is 

(ci, C 2 ) = (Enc P k(r; H(F(m), r)), G(r) © m). 


Security of Encryption Schemes in Weakened Random Oracle Models 


413 


where F : {0,1}** —* V, G : M —> {0, 1 ) k ', and H : V x M —> K, for an appropriate set 
V, are hash functions modeled as the random oracle. 

The idea to weaken the conversion is summarized as follows: Recall that H(m, r) in 
the FO conversion can be considered as encryptor’s signature (or a proof of knowledge) 
on m and r. To make it vulnerable by a collision, we introduce a new random oracle 
F and replace H(m, r ) with H(F(m ), r). The replacement does not harm the security 
in the random oracle model, while it can be exploited by the presence of the collision 
oracle CO F . 

Formally, we have following theorems on the (in)security. We omit the proof of The- 
orem[3] which is similar to the original one. 

Theorem 3. Assume that r f*Kt> is a OW-CPA secure and y-uniform public-key encryp¬ 
tion scheme for some negligible y. Then, VK& = dFO(!P7C£) is IND-CCA2 secure in 
the ROM ifW = 2^ k) . 

Theorem 4. Let T y Kt’> be a public-key encryption scheme. If WP < 2 k ' then T y Kt’> — 
d\ : 0(VKt'i) is not IND-CCA2 secure in the CT-ROM. 

Proof. We construct the adversary - CA\ . liT) that breaks the IND-CCA2 security 
of VKE!, which exploits the collision oracle CO F of F. 

The adversary IR\, on input pk, first queries to CO 1 . If the answer is _L, then the 
adversary flips a random fair coin b', outputs //, and halts. Otherwise, it obtains a col¬ 
lision (mi, m 2 ) of F and outputs it as a challenge. The adversary lAi receives the target 
ciphertext (cj,c*) = (Enc P k(r; H(F(mt,), r)), G(r) © mO for some r e V!. It queries 
(c'j, c'f) = (cf, c* 2 © m 0 © mi) to the decryption oracle and obtains ;«!_*, since 

c\ = EnCpkfr; H(F(mO, 0) = Enc pk (r; H(F(mi), r)), 
c' 2 — G(r ) © nib © m 0 © m\ = G(r) © m\~b- 

Hence, the adversary can answer b' = b correctly. 

Finally, we upper-bound the probability that the collision oracle outputs _L, which 
stems from the definition of the collision oracle. The probability is bounded by (1 - 
1 /W) 21 < exp(-(2 k ’ - 1 )/#lP) <1/ s/e. This completes the proof. □ 

3.2 The Second Variant wFO 

We next introduce the second artificial variant wFO and show that the obtained scheme 
by wFO is secure in the CT-ROM, however not generally secure in the SPT-ROM. 

The encryption procedure of P'KS' = wFOcP/OS) is given as follows. For a plaintext 
m e AT = {0,1}** and random strings (r, s) e IT = MxS, the ciphertext of VKS is 

(ci, C 2 , C 3 ) = (Enc pk (r; H(F(m, s), r)), G(r) © m, s), 

where F : {0,1)^ x S —> V, G : A1 —> {0,1}*\ and H :Px M. —» H are hash functions 
modeled as the random oracles. 

Notice that (H(F(m, s), r), s) is a proof of knowledge on (m, r, s) which resists a colli¬ 
sion on F however is vulnerable by a second-preimage attack against F as in Numayama 
et al. lfT8ll . 
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We can show that the obtained scheme is IND-CCA2 secure in the CT-ROM by using 
Lemma[2] 

Theorem 5. Suppose that VKt) is a OW-CPA secure andy-uniformpublic-key encryp¬ 
tion scheme for some negligible y. Then, VKt'/ — wFOOPTCS) is IND-CCA2 secure in 
the CT-ROM ifWand #S~ l are negligible in k. 

However, its security is broken under the presence of the second-preimage oracle for F. 


Theorem 6. Let VK& be a public-key encryption. If #P < 2 k ’ ■ #S, then the scheme 
VKt-J = wFO (VKS) is not IND-CCA2 secure in the SPT-ROM. 

Proof. We construct the adversary - (JHthat exploits the second-preimage 
oracle SVO F associated to F. The adversary L7l\ chooses random distinct plaintexts mo 
and mi and queries them to the challenger. The challenger responses 

(cp c* 2 , cp = (Enc P k(r; H(F(m b , s), r)), G(r) ® m b , s). 

Receiving (cp cp cp, the adversary thi queries (mo, s) to the second-preimage oracle 
SVO F . If it receives ffi from the second-preimage oracle, then it flips a random fair coin 
b’, outputs b', and halts. Otherwise, it obtains (m', s') ± (mo, s) such that F(nio, s) = 
F(m', s'). So, the adversary queries 

(c'j, c' 2 , cp = (cp c* ® mo © m', s') 

to the decryption oracle. Notice that, if (cp cp cp is the valid ciphertext of iiiq, then we 
have 


c\ - EnCp k O; H(F(m 0 , s), r)) = Enc P k(r; H(F(m’, s'), r)), 
c' 2 = G(r) ffi mo © mo © in = G(r) © m , 

C3 = s', 

and (c \, c' 2 , cp is a valid ciphertext for in'. On the other hand, if the ciphertext is the 
encryption of m 1 , we have 

(c\,c' 2 , cp = (Enc P k(r; H(F(m\, s), r)), G(r) ffi m\ ffi mo ffi m , s'). 

Thus, if / = F(m\, s) is equal to F(m\ ffi mo ffi m ', s') the decryption oracle returns 
«?i ffi mo ffi m'(+ in'). Otherwise, the decryption oracle returns ±. 

Thus, if the answer is in', then the adversary concludes that (cp cp cp is the cipher- 
text of mo, that is, it outputs b' = 0. Otherwise, the adversary concludes that it is the 
ciphertext of mi, that is, it outputs b' — I. Therefore, tR can output the correct answer 
unless th receives ± from the second-preimage oracle. 

We finally bound the probability that the oracle outputs ffi. It is bounded by (1 — 
1 /#lP) 2i < exp(-(2 i ' / • #S - \)IW) < 1/ sfe as required. This completes the proof. 

□ 


Security of Encryption Schemes in Weakened Random Oracle Models 


415 


3.3 The Original Fujisaki-Okamoto Conversion 

We next show that the obtained scheme by the conversion FO with the one-time pad is 
secure in the SPT-ROM, but not secure in the FPT-ROM in some parameter setting. 

Let G : Ai —* {0,1}* and H : {0,1}^ x AI —> R be hash functions modeled 
as the random oracles. Recall the encryption procedure of r P r KE>' = FO (VK&). For 
a plaintext m e AM = {0, I } k and a random string r e R' = Ai, the ciphertext is 
(EnCpkfr; H(m, r )), G(r ) © m). 

Modifying the existing proofs, we can show the scheme is secure in the SPT-ROM 
using Lemma[2] 

Theorem 7. Suppose that r P'Kt> is OW-CPA secure and y-uniform for some negligible 
y. Then, VK’c! = YOPP'Kt'i) is IND-CCA2 secure in the SPT-ROM. 

However, the presence of the first-preimage oracle for G violates the IND-CPA security 
of VK&' in some parameter settings. Note that if m is 0*\ the second component of the 
ciphertext is G(r), which is vulnerable the first-preimage oracle of G. 

Theorem 8. Let C = #Ai/2 k '. Assume that C = k 0t ] ) . Then, r P'Kt'/ - YOiP'KtY) is not 
IND-CPA secure in the FPT-ROM. 

Proof. We prove the theorem by constructing the adversary A - (A ; . At ) which ex¬ 
ploits the first-preimage oracle of G, AVCf . The adversary A\ , on input pk, queries 
ni{) = O'" and m [ = 1 A to the challenger. The adversary At, on input the target cipher- 
text (CpCj), queries c* to the first-preimage oracle of G. If it obtains r, it checks that 
ci = EnCpkfT; //((), r)). If the check passes, the adversary outputs /;' = 0. Otherwise, it 
flips a random fair coin b’ , outputs b' , and halts. 

It is obvious that if b = 0 and r — r, the adversary answers correctly, that is, it 
outputs b' - b. If b = 1, the preimage of the query G(r ) © 1 A never equals to r since 
G(r) 4- G(r) © 1 A . Hence, the adversary’s check fails if b - 1. 

We estimate the probability that the adversary wins. By Lemma[U with probability 
at least 1 - 2~ ie , there is no preimage of size larger than L, where if C > 1 then 
L = 5Ck' In 2/(lnk' + In In 2) < ACk'/ In k' and otherwise L = 5P ln2/(lnk' + lnln 2) < 
4k'/ Ink 7 for all sufficiently large k'. 

Let Good denote the event that r <— TVOcfGir)). We then have Pr[Good] > (1 — 
2^ lk )/L. Hence, we obtain that 

Pr[// = b] = Pr [b' = 0 | b = 0 A Good] Pr [b = 0 A Good] 

+ Pr {b' = 0 | b = 0 A -.Good] Pr[k = 0 A ^Good] 

+ Pr [b' = 1 | b= 1 ] Pr[£> = 1] 

= 1 ■ i • Pr[Good] + ^ ^ • (1 - Pr[Good]) + \'\ 

11 1 1 - 2~ 2k ' 

= - + ? Pr[Good ]>- + ^_, 

and 4L is a polynomial in the security parameter k. This completes the proof. □ 

As shown above, the FO conversion is not secure in the FPT-ROM, but there is a way 
to modify it so as to maintain the security in the FPT-ROM. Naito, Wang, and Ohta 



416 


A. Kawachi et al. 


Key Generation Encryption Decryption 


Input: 

1* 

Input: 

m G 

{o,i}*-*°-\ /p k 

Input: 

G gsk 

1: 

(/pk. gsk) <— F 

1: 

r <— 

10,1)*° 

1: 

S II t <- gsk(c) 

Output: 

(/pk. gsk) 

2: 

s <— 

(m\\0 k ')®G(r) 

2: 

/■<—/© H(s) 



3: 

t < — 

H(s) © r 

3: 

M <— ^ © G(r) 



4: 

c <— 

/pkH II t) 

5: 

If M = m || 0* 1 set o «— 



Output: 

c 


6: 

Otherwise set o <— _L 


Output: o 


Fig. 1 . OAEP 


proposed the conversion method that converts a cryptosystem secure in the ROM to 
that secure even in the FPT-ROM ||29| . In the case of the FO conversion, the public key 
is (pk, c), where c <— {0, 1 } k , and the ciphertext is 

(ci,C2) = (EnCpk(r;//(c, m,/■)), G(c, r)© ;w), 

where the domains of H and G are modified. Intuitively, this change makes the first- 
preimage oracles, TVO 11 and r F r PO G , useless. 

3.4 OAEP 

We finally focus on the OAEP and present its IND-CCA2 security in the FPT-ROM. For 
the security parameter k , let ko and k\ be functions in k, where ko < k - ko- Let F be a 
family of partial-domain one-way trapdoor permutations of a domain {0,1 *° x {0, 1 \ k °. 

(See f30ll for the definition of the partial-domain one-wayness.) Furthermore, let G and 
H be hash functions such that G : {0,1}*° —> {0,1 }* — *° and // : {0, l}* - *" —> {0,1}*°. 
Then, the OAEP encryption scheme based on F is described in Fig.[T] 

We obtain the following theorem that states the security of the OAEP encryption 
scheme in the FPT-ROM. 

Theorem 9. Let F be a family of partial-domain one-way trapdoor permutations. 
Then, the OAEP encryption scheme based on F is IND-CCA2 secure in the FPT-ROM. 

We here only give the sketch of the security proof. 

Proof (Sketch). As in the proof of Fujisaki et al. (301 . we prove the security by defining 
a sequence of games and bounding the advantages of the adversary among the games. 
The games are the almost same as the original ones in Oil . However, we need to pay 
attention to the following two points. First, as mentioned, we no longer have the unifor¬ 
mity of the ROM because of the first-preimage oracle. Second, the adversary can make 
use of the first-preimage oracle. These points make the security proofs difficult. 

In order to observe the difference between the security proofs in the FPT-ROM and 
ROM, let us consider the following two games. We will describe the sequence of the 
games in the full version. 

- Gamei: The challenger generates a pair of keys (/pk.gsk) by using the key- 
generation algorithm. It next produces r + <— {0, 1}*° and obtains g + <— ROc(r + ). In 
generation of the target ciphertext, the challenger generates the random string r + . 
The target ciphertext y* is generated as follows: 
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r* <— r + , s* <— (mt, || 0 Al ) © g + , t* «— r* © RO//(.?*), 

x* <- (s*,f), y‘<-f P k(x*)- 

The ciphertext y* is given to 3K. Finally, the adversary Ch outputs a bit 1/. 

- Game 2 : We modify the above game, by changing the rule for generation of g 1 . That 
is, g + is not obtained by the query of the random oracle, but obtained by choosing 
from {0,1 ) k k " uniformly at random. Notice that (r, g + ) is not contained in the table 

To- 

Let AskG be the event that r + is queried to ROc- The original proof in the ROM showed 
that, if the value r + is not queried to ROc, the Game] and Game 2 are identical. 

On the other hand, in our case in the FPT-ROM, even if the event AskG does not 
occur, that is, the value r + is not queried, we cannot say that Gamei and Game 2 are 
identical. Notice that the adversary would distinguish the games by querying g + to 
FPOg, which leads to a contradiction to the partial-domain one-wayness in the final 
game. The value g + must have the preimage r in Gamei since (r + ,g + ) is contained 
in the table Tg- In contrast, the value g + has no preimages in Game 2 with high proba¬ 
bility if k - ko is much larger than ko since ( r + , g + ) is not inserted in the table Tc and 
± <— FPOc(g + ) with high probability. We must take care of this event AskG - . Addi¬ 
tionally, it would distinguish between Gamei and Game 2 by querying (mi_z,||0 Al )ffi.v* to 
FPOg, which also leads to contradiction to the partial-domain one-wayness in the final 
game. This event is denoted by AskG 0 . Notice that, conditioned on the above events, 
AskG. AskG - , and AskG 0 , do not occur, g + is almost perfectly uniform in Game; by 
Lemma|2] Hence, we can show two games Game; and Game 2 are statistically close if 
the events do not occur. 

By carefully applying similar arguments, we can show the IND-CCA2 security for 
the OAEP encryption scheme in FPT-ROM. □ 

4 Future Work 

It should be noted that our WROMs are based on a simplified variant, which Numayama 
et al. Ifl8l and Pasini and Vaudenay |fT71 also adopted, of the original WROMs of 
Liskov lfl5l . 

The original WROMs consists of the ideal compression function h : {0, l} k+k —> 
{0, 1 } k of fixed input length and the first-preimage oracle. Then, he discussed the secu¬ 
rity of the flexible input-length hash functions H h : {0,1}* —> {0,1}* employing h as 
the component in the context of indifferentiability OH . A random oracle H is often 
instantiated by employing a compression h. (See, e.g., the survey in |8] Section 2].) 
Therefore, his work reflects the attacks against the compression function of MD5 and 
SHA-1 rather than the construction H. 

On the contrary, we (and similarly 11181171 1 discussed the monolithic random oracle 
H and the additional oracles associated with H. Hence, our model has a gap from such 
a realistic instantiation of the random oracle in some sense. We leave filling this gap as 
future work. 
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Except for the FO conversion, there are several conversion methods in the ROM, 
such as REACT f32il and GEM lf33l . It would also be interesting as future work to 
examine the security of these conversion methods in the WROMs. 
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Abstract. We present a fully homomorphic encryption scheme which 
has both relatively small key and ciphertext size. Our construction fol¬ 
lows that of Gentry by producing a fully homomorphic scheme from 
a “somewhat” homomorphic scheme. For the somewhat homomorphic 
scheme the public and private keys consist of two large integers (one of 
which is shared by both the public and private key) and the ciphertext 
consists of one large integer. As such, our scheme has smaller message 
expansion and key size than Gentry’s original scheme. In addition, our 
proposal allows efficient fully homomorphic encryption over any field of 
characteristic two. 


1 Introduction 

A fully homomorphic public key encryption scheme has been a “holy grail” 
of cryptography for a very long time. In the last year this problem has been 
solved by Gentry [7|8j . by using properties of ideal lattices. Various cryptographic 
schemes make use of lattices, sometimes just to argue about their security (such 
as NTRU m, in other cases lattices are vital to understand the workings of 
the scheme algorithms (such as [|5]). Gentry’s fully homomorphic scheme falls 
into the latter category. In this paper we present a fully homomorphic scheme 
which can be described using the elementary theory of algebraic number fields, 
and hence we do not require lattices to understand its encryption and decryption 
operations. However, our scheme does fall into the category of schemes whose 
best known attack is based on lattices. 

At a high level our scheme is very simple, and is mainly parametrized by an 
integer N (there are other parameters which are less important). The public key 
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consists of a prime p and an integer a modulo p. The private key consists of 
either an integer z (if we are encrypting bits), or an integer polynomial Z(x) of 
degree N — 1 (if we are encrypting general binary polynomials of degree N — 1). 
To encrypt a message one encodes the message as a binary polynomial, then one 
randomizes the message by adding on two times a small random polynomial. To 
obtain the ciphertext, the resulting polynomial is evaluated at a modulo p. As 
such, the ciphertext is simply an integer modulo p (irrespective of whether we 
are encrypting bits or binary polynomials of degree N — 1). 

To decrypt in the case where we know the message is a single bit, we mul¬ 
tiply the ciphertext by 2 and divide by p. We then round this rational number 
to the nearest integer value, and subtract the result from the ciphertext. The 
plaintext is then recovered by reducing this intermediate result modulo 2. When 
we are decrypting a binary polynomial we follow the same procedure, but this 
time we multiply by the polynomial Z(x) and divide by p, to obtain a rational 
polynomial. Rounding the coefficients of this polynomial to the nearest integer, 
subtracting from the original ciphertext, and reducing modulo two will result 
again in recovering the plaintext. 


2 Preliminaries 


2.1 Notation 

Given a polynomial g(x) = y~b_ n g t x l £ Q[x], we define the 2-norm and oo-norm 
as _ 


h(x)h = \ 9i and lls^Hoo = .maxjftl . 

\ ' . 

For a positive value r, we define two corresponding types of “ball” centered at 
the origin: 


B 2 , N (r) = < ^2 aiX 1 : y , aj < 
l i=0 2=0 

(N-l 

£?oo,iv(r) = < aj.x 1 : —r < cii < r 


We have the usual inclusions B 2 ,N(r) C #oo,iv(r) and Boo ; iv(r) C B 2 .n (VN-r). 
We also define the following half-ball 

{ N—l 

aiX 1 : 0 < a.j < r 

2=0 

All reductions in this paper modulo an odd integer m are defined to result in 
a value in the range [—(m — l)/2,..., (?n — l)/2]. The notation a <— 6, means 
assign the value on the left to the value on the right. Whereas a <— r A where A 
is a set, means select a from the set A using a uniform distribution. 
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2.2 Ideals in Number Fields 

Since the underlying workings of our scheme are based on prime ideals in a 
number field, we first recap on some basic properties. See [3] for an introduction 
to the elementary computational number theory needed. 

Let K be a number field Q(0) where 9 is a root of a monic irreducible poly¬ 
nomial F(x) £ Z[x] of degree N. Consider the equation order Z [9\ inside the 
ring of integers Ok- For our parameter choices we typically have Ok = Z[0], 
but this need not be the case in general. Our scheme works with ideals in Z [6\ 
that are assumed coprime with the index [Ok ■ Z[ 0 ]], so there is little difference 
with working in Ok- These ideals can be represented in one of two ways, either 
by an TV-dimensional Z-basis or as a two element Z[0]-basis. When presenting 
an ideal a as an TV-dimensional Z basis we give N elements 71,... ,7 jv £ Z[0], 
and every element in a is represented by the Z-module generated by 71,... ,jn- 
It is common practice to present this basis as an n x n-matrix. The matrix is 
then set to be ( 7 jj), where we set 7 * = 7 j.jO-’, i.e. we take a row ori¬ 

ented formulation. Taking the Hermite Normal Form (HNF) of this basis will 
produce a lower triangular basis in which the leading diagonal (d\,..., cIn) sat¬ 
isfies di+i\di. Note that this last property of the HNF of a basis only follows for 
matrices corresponding to ideals [S] (who use a different orientation). 

However, every such ideal can also be represented by a Z[0]-basis given by 
two elements, (<5i, < 52 )- In particular one can always select (5i to be an integer. 
For ideals lying above a rational prime p , it is very easy to write down a two 
element representation of an ideal. If we factor F(x) modulo p into irreducible 
polynomials 

t 

F(x) = Fi(x) £i (mod p) 
i= 1 

then, for p not dividing [Ok '■ Z[0]], the prime ideals dividing pZ[9\ are given by 
the two element representation 


Pi = (p,Fi(6)). 

We define the residue degree of p,; to be equal to the degree di of the polynomial 
Fi(x). Reduction modulo p, produces a homomorphism 

i Pi : Z\8\ —> F p d 4 . 

We will be particularly interested in prime ideals of residue degree one. These 
can be represented as a two element representation by (p, 8 — a) where p is the 
norm of the ideal and a is a root of F(x) modulo p. If \ £ Z[0] is given by 
X = JT =0 1 CiO 1 then the homomorphism simply corresponds to evaluation of 
the polynomial x(^) i n a modulo p. 

Given a prime ideal of the form {p, 9 — a), the corresponding HNF repre¬ 
sentation is very simple to construct, and is closely related to the two element 
representation, as we shall now show. We need to row reduce the 2 N x N matrix 
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( P 


\ 


P 


0 


0 


p 


—a 1 


0 


—a 1 



where F(x) = P't'F ■ It is not hard to see that the HNF of the above matrix 
is then given by 


l P °\ 


—a 1 


\ -a w_1 0 1/ 


where all the integers in the first column, in rows two and onward, are taken 
modulo p. 

Recall that an ideal is called principal if it is generated by one element, i.e. we 
can write p = ( 7 ) = 7 • Z \0\. Note that given an HNF or two-element represen¬ 
tation of an ideal, determining whether it is principal, and finding a generator 
is considered to be a hard problem for growing N. Indeed the best known algo¬ 
rithms (which are essentially equivalent to finding the class and unit group of a 
number field) run in exponential time in the degree of the field. For fixed degree 
they run in sub-exponential time in the discriminant j2]. In addition the genera¬ 
tor of a principal ideal output by these algorithms will be very large. Indeed, this 
generator will typically be so large that writing it down as a polynomial in 9 may 
itself take exponential time m- Thus finding a small generator of a principal 
ideal is possibly an even harder problem. Quantumly finding a generator of a 
principal ideal is relatively easy [TU], however writing down a small generator is 
not known to be easy. 

3 Our Somewhat Homomorphic Scheme 


In this section we present our somewhat homomorphic scheme and analyze for 
which parameter sets decryption works. To simplify the presentation we present 
the scheme at this point as one which just encrypts elements in V = { 0 , 1 }. 
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3.1 The Scheme 

A somewhat homomorphic encryption scheme consists of five algorithms: 
{KeyGen, Encrypt, Decrypt, Add, Mult}. We shall describe each in turn; notice 
that the most complex phase is that of KeyGen. The scheme is parametrized 
by three values (TV, 77, /x). A typical set of parameters would be (N,2^, \/N). 
Later we shall return to discussing the effects of the sizes of these values on the 
security level A and performance of the scheme. 

KeyGen(): 

— Set the plaintext space to be V = {0,1}. 

— Choose a monic irreducible polynomial F(x ) £ Z[x] of degree N. 

— Repeat: 

• S(x) Boo yN (r)/2). 

• G(x) <- 1 + 2 ■ S(x). 

• p <— resultant(G(x), F(x)). 

— Until p is prime. 

— D(x) <— gcd(G(x),F(x)) over F p [x]. 

— Let a € F p denote the unique root of D(x). 

— Apply the XGCD-algorithm over Q[x] to obtain Z(x) = YliL rj 1 z i xl e Z[x] 
such that 

Z(x) ■ G(x) = p mod F(x). 

— B *— zo (mod 2 p). 

— The public key is PK = (p, a), whilst the private key is SK = (p, B). 


Encrypt(M, PK): 

— Parse PK as (p. a). 

— If M £ {0,1} then abort. 

— R{x) Boo,jvW2). 

— C(x) <— M + 2 ■ R(x). 

— c <— C(a) (mod p). 

— Output c. 

Add(ci,c 2 ,PK): 

— Parse PK as (p,a). 

— C 3 <— (ci + c 2 ) (mod p). 

— Output C 3 . 


Decrypt(c, SK): 

— Parse SK as ( p,B ). 

— M <— (c — \c- B/p~\) (mod 2). 

— Output M. 


Mult(ci,c 2 ,PK): 

— Parse PK as (p,a). 

~ C 3 (ci • c 2 ) (mod p). 

— Output c 3 . 


3.2 Analysis 

In this section we analyze for which parameter sets our scheme is correct and 
also determine how many homomorphic operations can be performed before 
decryption will fail. 
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KeyGen algorithm. We can see that KeyGen generates an element 7 = G(9) 
of prime norm p in the number field K defined by F{x). As such we have con¬ 
structed a small generator of the degree one prime ideal p = 7 • Z[9\. To find the 
two element representation of p, we need to select the correct root a of F[x ) 
modulo p. Since 7 = G(9) £ p, we have that G(a) = 0 mod p, so G(x) and F(x ) 
have at least one common root modulo p. Furthermore, there will be precisely 
one root in common, since otherwise 7 would generate two different prime ideals, 
which clearly is impossible. This explains the fact that D{x) has degree one; we 
are using D(x) to select the precise root of F(x) which corresponds to the ideal 
p generated by 7 . The two element representation of the ideal p then simply is 
p = p ■ Z[9] + (9 - a)Z[9]. 

Encrypt algorithm. The message M is added to twice a small random polyno¬ 
mial R(x) resulting in a polynomial C(x). The oo-norm of the polynomial R(x) is 
controlled by the parameter pi. Encryption then simply equals reduction of C(9) 
modulo p using the public two element representation (p, 9 — a). As explained 
before, this simply corresponds to evaluating C{x) in a modulo p. Furthermore, 
note that this precisely implies that C(9) — c £ p. 

Decrypt algorithm. By definition of encryption, we have that C(9) — c £ p and 
p is principal and generated by 7 = G(9). Hence, we can write 

C(9)-c = q(9)- 7 , 

with q(9) £ Z[9\. It is clear that if we recover the element C(9), then decryption 
will work since C(9 ) = M + 2 • R(9). Note that 7” 1 is precisely given by Z(9 ) /p, 
where Z was computed in KeyGen. Dividing by 7 therefore leads to the following 
equality 

-c-Z{9)/p = q{9)-{C{9)-Z{e))/p. 

The above equation shows that if \\G{9) • Z{9)/p\\ 00 < 1/2, then simply rounding 
the coefficients of — c • Z(9)/p will result in the correct quotient q{9). This will 
allow for correct decryption by computing C{9) = c+ q(9) ■ 7. The crucial part 
therefore is to obtain a bound on ||Z(a;)|| 00 . 

Lemma 1. Let F(x ), G(x ) £ Z[x] with F(x) monic, deg (F) = N and deg(G) = 
M < N and resultant^, G ) = p, then there exists a polynomial Z{x) £ Z[x] with 
Z(x) • G(x) = p mod F(X) and 

||^)||oc<||G( : r)|ir 1 -|l^)ll 2 M - 

PROOF: Over Q[x], we have that gcd(G(x),F(x)) = 1, so there exists poly¬ 
nomials S(x),T(x) £ Q[x] with deg(S) < N and deg(T) < M such that 
S(x) ■ G(x) + T(x) ■ F(x) = 1. It is well known (see for instance Corollary 
6.15 of [6]) that the polynomials S and T are given by S = '}2f =Q 1 SiX 1 and 
T = T,fj 0 1 tiX 1 , where the Sj and tj, are the solutions of 
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Syl(G, F) t 


’ SN-l ’ 


f 1 

so 



tM-1 





0 


\ to J W 


where Syl (G,F) is the Sylvester matrix of G and F. The resultant is precisely 
det(Syl(G, F)) = p , so by Cramer’s rule we find an explicit expression for the 
coefficients Sj, namely, the determinant of a submatrix of Syl(G,-F) T (remove 
one of the columns containing the coefficients of G and the last row) divided by 
p. Using Hadamard’s inequality to bound the determinant of such submatrices, 
we finally conclude that \zi\ < ||G ||^ _1 • □ 


In the remainder of the paper we will assume that M = TV — 1 which will happen 
with very high probability. 


Define 


6oo := sup 


||g(a;) • h(x) mod F(a ;)|| 00 
Mx)\\oo -IIM^Iloo 


deg(g),deg(/i) < TV 


We then have that 


||ffW-M0)ll=o<«oo-NI=o-|^ll 


where deg (g), deg (h) < TV. Gentry [SJ Section 7.4] derives several bounds on the 
above quantity but for the 2-norm and it is easy to obtain the equivalent bounds 
for the oo-norm. To illustrate the two extreme cases, i.e. that 6 <*, can range from 
fully exponential in TV to linear in TV, we give the following lemma, which also 
motivates why we propose to use F(x) = x 2 +1 in practice. 

Lemma 2. Let Fi(x) = x N — a and ^(a;) = % N — ax N_1 then 

M*i) < MTV and M^) < 

PROOF: Let g = 9iX l and h = tux 1 , then 

N -1 / 

g ■ h mod F x = ^ ^ gMfc-i + a ^ gih N+k -i 

k— 0 \0<i<fc k<i<N 



from which the bound on <5oo(Pi) immediately follows. Similarly, write g ■ h = 
Y2k= o " 2 °kX k , then g ■ h mod F 2 = J2k=o dkX k with d k = c k for k = 0,..., TV — 2 
and 
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JV-l 

djv-l = ^ CN-i+id 1 

i—0 


Since all Ci clearly are smaller than -W’||fl , || 0 o||/i||oo the bound on <5oo(F2) follows. 

□ 


From this we can conclude that 


C(0) ■ Z(9) 
P 



p 


so decryption will work as long as 




^Dec* 


Note that the expected value of rpec will be roughly HG^/Z^oo, since the resul¬ 
tant p will be about ||G||^ • ||F||.^ _1 . So for HGjloo < rD ec , we have 

C(x) = c + q(6) ■ 7 = c — [c- Z(x)/p\ 7 , 

and since M = C(x) mod 2 and 7=1 mod 2 we finally obtain the simplified 
decryption function 

M = c — [c- B/p] mod 2, 

where B is zq. Note, we can take B as z 0 modulo 2 p as we are only interested 
in rounding c- B/p to the nearest integer and then taking the result modulo 2. 
Furthermore, Lemma |T] implies that all coefficients of Z(x) typically will be 
smaller than p, since p = resultant(F, G) and thus p ~ ||G(a;)||^' • ||F(a;)|| 2 / . This 
means that the reduction modulo 2 p in the key generation will have no effect in 
most cases. However, it will turn out to be a necessary assumption in assuring 
a uniform distribution when we switch to the full homomorphic scheme. 

For our KeyGen algorithm we have that each coefficient of G has size approx¬ 
imately rj , which implies that we have the estimate 


^Dec 


Vn ■ p 

2-Scc ' 


For F(x) = x N + 1 we thus obtain the estimate rpec ~ 77 /(2 • y/N). In the 
remainder of the paper we will also sometimes use rE nc instead of p. Note that 
if one wants to compare with Gentry’s scheme, one should take into account 
that our bounds are formulated for the cx)-norm, whereas Gentry works with the 
2 -norm. 

Add and Mult algorithms. It is clear that both algorithms are correct. How¬ 
ever, we need to consider how the error values propagate as we apply Add and 
Mult. In particular, decryption of c = C(a) will work for a polynomial C{x) 
if C(x) £ Soo,Jv( r Dec)- However, as we apply Add and Mult to a ciphertext 
the value of C(x) starts to lie in balls of larger and larger radius. As soon as 
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C(x) B 00 ! n( roec), we are no longer guaranteed to be able to decrypt correctly. 
This is why our basic scheme is only somewhat homomorphic, since we are only 
able to apply Add and Mult a limited number of times. 

Let ci and C2 denote two ciphertexts, corresponding to two randomizations 
C\(x) = M\ + Ni(x) and 62(2;) = M 2 + N 2 (x)] where M, £ {0,1} are the 
messages and iVj( x) € S 00 ,.iv( r '* — 1) is the randomness, i.e. Ct(x) € 2?oo,iv(r*). 
We let 


C 3 (x) = M 3 + N 3 (x) = (Mi + Ni(x)) + (M 2 + N 2 (x)), 
Ci(x) = Mi + Ni(x) = (Mi + Ni(x)) ■ (M 2 + N 2 (x)), 

where M 3 , Mi £ {0,1}. Then 

C 3 (x) £ Boc, N (ri + r 2 ) 


and 

Ci(x) £ B 00}N (6 00 ■ n ■ r 2 ). 

Initially we start with a ciphertext with C(x) lying in B 00 ^(p,-\- 1). After execut¬ 
ing a circuit with multiplicative depth d , we expect the ciphertext to correspond 
to a polynomial C'(x) lying in a ball iJoo.JvM with 

r«(5oo-M) 2d . 


Thus we can only decrypt the output of such a circuit if r < rpec, i.e. 
dlog2 < log log r Dec - loglog^oo • n) 

Vn ■ 77 


1 log log 


2 • S 0 


log log(Joo ■ n). 


4 Security Analysis 

We consider three aspects of security; key recovery, onewayness of the encryption 
and semantic security. Whilst semantic security is based on what might at first 
appear a non-traditional problem, the other two notions of security are related 
to well studied problems in number theory. This is similar to other notions in 
cryptography; for example key recovery in ElGamal is related to the DLP prob¬ 
lem, and semantic security to the relatively obscure (for mathematicians) DDH 
problem. However, we first show that our scheme is in some sense a specialisation 
and optimization of Gentry’s scheme. 

Link With Gentry’s Scheme. To discuss the security in more detail, we 
first show that our scheme is a specialisation and simplification of the lattice 
based scheme of Gentry [7]. The generator 7 in our scheme is equivalent to the 
private basis of the ideal J in Gentry’s scheme, the public basis is then the two 
element representation (p, 6 — a). The ideal I of Gentry’s scheme is simply set 
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to the principal ideal (2). Therefore, we see that KeyGen is a specialised form 
of KeyGen for Gentry’s scheme: in particular we use the compact two element 
representation (p, a) of the public basis, instead of the larger HNF representation 
as Gentry does. 

We now turn to the encryption algorithm. The element C(9) = M( 6 ) + 2-R{6) 
is precisely the value of ip' computed in Gentry’s encryption algorithm, with a 
value of rEnc (in the 2-norm) equal to '/N ■ p. Gentry then produces his ciphertext 
ip by reducing ip' modulo the ideal J using the HNF basis. It is at this point 
that we seem to depart from Gentry’s presentation: we actually compute the 
reduction of ip’ modulo p using the public two element representation. Given 
ip' as a polynomial in 0, this involves replacing 0 by a and reducing the result 
modulo p. So given C(x), we produce c by simply computing c = t p (C(6 1 )) £ F p . 
However, given our earlier discussion on the HNF of the ideal given by (p, 6 — a) 
we see that the two reduction algorithms are equivalent when we are working in 
the equation order Z[8\. 

Hence, we conclude that our scheme is a specialisation of Gentry’s scheme. For 
the given specialisation our key sizes are much smaller than Gentry’s, whilst our 
ciphertexts are the same size. When compared to the full generality of Gentry’s 
scheme our ciphertexts are also much smaller than Gentry’s. The link between 
the two schemes, and the relative simplicity of our scheme, may help shed light 
on parameter choices in Gentry’s original scheme. 

Key Recovery. Recall the public key in our scheme consists of a principal 
degree one prime ideal in two element representation, whilst the private key 
consists of the inverse of a small generator of this principal prime ideal. To see 
that the generator 7 is small, notice that the polynomial G{x) has an cx)-norm 
given roughly by 77, whereas the size of p is roughly y/~N r] N -\\ .F11 1 . Recovering 
the private key given the public key is therefore an instance of the small principal 
ideal problem: 

Definition 1 (Small Principal Ideal Problem (SPIP)). Given a principal 
ideal a in either two element or HNF representation compute a “small” generator 
of the ideal. 

This is one of the core problems in computational number theory and has formed 
the basis of previous cryptographic proposals, see for example [3]. There are cur¬ 
rently two approaches to the above problem. The first approach is a deterministic 
method based on the Baby-Step/Giant-Step method attributed to (Tj. This takes 
time 

7V° (JV) • y Vnin l/t, ft) • |Z\| o(1) , 

where A is the discriminant of Z[0\, R is the regulator and A = min(^ 1 log |7^| 
is the mimimal logarithmic embedding of 7. Clearly A can itself be bounded by 
77, a minor detail which we leave to the reader. 

The second approach to this problem is via Buchmann’s sub-exponential al¬ 
gorithm for units and class groups which is described in [2] and [1] [Chapter 6]. 
This method has complexity 
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exp 


^0(N log N) ■ \/log(Z\) • log log(Z\)j 


where again A is the discriminant of the order T\6\. However, this method is 
likely to produce a generator of large height, i.e. with large coefficients. Indeed 
so large, that writing the obtained generator down as a polynomial in 6 may 
take exponential time. 

In conclusion determining the private key given only the public key is an instance 
of a classical and well studied problem in algorithmic number theory. In particu¬ 
lar there are no efficient solutions for this problem, and the only sub-exponential 
method does not find a solution which is equivalent to our private key. 

Onewayness of Encryption. In this section we consider the problem of re¬ 
covering a message given a ciphertext element. It is readily seen that this is 
equivalent to solving the following problem: Given p and a, c € F p find Xi for 
i = 0,..., N — 1, such that 


N -1 



where \xi\ < rEno for some integer value of k. 

To recast this as a lattice problem, consider the lattice generated by the rows 
of the matrix H given earlier. Consider the lattice vector 


(fc, —Xi ,..., — x n ) ■ H = (c — Xq, —xi ,..., — x n ). 


This is a lattice vector which is very close (within rE nc in the cx)-norm, or %/iV-rEnc 
in the 2-norm) to the non-lattice vector (c,0,...,0). Hence, determining the 
underlying plaintext given the ciphertext is an instance of the closest vector 
problem. 

However, the underlying lattice is a well-studied lattice in algorithmic number 
theory, see for example the applications of LLL described in panes]. A lattice 
generated by a matrix such as if, namely a matrix in Hermite Normal Form in 
which all but one diagonal entry is equal to one, is probably the most studied 
lattice problem from the computational perspective in number theory. Thus 
whilst we are unable to make use of modern worst-case/average-case reductions 
for our scheme, the underlying lattice problem is well studied. 

However, for later use, we will recap on the analysis Gentry has given for 
this problem. Although one should bear in mind that Gentry’s analysis is for a 
general lattice arising from the HNF of an ideal and not for the specific one in our 
scheme. The best known attack on Gentry’s scheme is one of lattice reduction, 
related to the bounded distance decoding problem (BDDP). In particular it is 
related to finding short/closest vectors within a multiplicative factor of roec/iTnc 
in a lattice of dimension N. If we set 
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then it is believed that solving BDDP has difficulty 2 N / e (see [Ej [Section 7.7]). We 
shall refer to the value 2 N / e as the security level of our somewhat homomorphic 
scheme. 

Semantic Security. Finally we discuss the semantic security of our somewhat 
homomorphic encryption scheme. Consider the following distinguishing problem: 


Definition 2 (Polynomial Coset Problem (PCP)). The challenger first 
selects b <— R {0,1} and runs KeyGen as above to obtain a value of a and p. If 
6 = 0 then the challenger performs 

- R(x) <— R Soo,Af( r Enc)- 

- r R(a) (mod p). 

Whilst if b = 1 the challenger performs 

- r <— R F p . 

Given (r, PK) the problem is to guess whether 6 = 0 or b = 1. 

We call the problem the Polynomial Coset Problem as it is akin to Gentry’s 
Ideal Coset Problem from [7] . The problem basically says one cannot determine 
whether r is the evaluation of some small polynomial at a or is a random value 
modulo p. Note that the size of the space .Boo^rEnc) is roughly rEnc^, whereas 
Fp has size r/ N . So if rE nc is much smaller than 77, we are trying to distinguish 
a relatively small space within a larger one. Note, in the case where 6 = 0 
we generate the value R(x) from Hoo,Jv(rEnc) as opposed to Ho^Az/rDec), since 
we are interested in arguing about semantic security for what are the simplest 
ciphertexts to break. 

The proof of the following theorem closely follows the proof of Theorem 7 
of |7j, but we include it here for completeness. 

Theorem 1. Suppose there is an algorithm A which breaks the semantic secu¬ 
rity of our somewhat homomorphic scheme with advantage e. Then there is an 
algorithm B, running in about the same time as A, which solves the PCP with 
advantage e/2. 

PROOF: The algorithm B creates a challenge ciphertext for algorithm A from 
its own challenge (r, PK) by setting 

c <— (Mp(a) + 2 • r) (mod p), 

where M 0 and Mi are ^4’s two challenge messages and (3 <— R {0,1}, is B' s choice 
of a challenge bit. A sends back a guess (3' for f3 and B returns f3 ® /?'. 

When 6 = 0 in the PCP problem, it is clear that the challenge ciphertext c 
has the correct distribution, so B obtains the same advantage as A, namely e. 
When 6 = 1, r is uniformly random modulo p and since p is odd, 2 r is uniformly 
random modulo p and therefore so is c. Hence, the advantage of A is 0, which 
implies that B ’s overall advantage is e/2. □ 
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5 A Fully Homomorphic Scheme 

We now proceed to turning the somewhat homomorphic scheme into a fully 
homomorphic scheme. Since we have shown that our scheme is a specialisation 
of Gentry’s scheme, we only need to recast Gentry’s method for our parameters. 
Indeed we can simplify the method somewhat, since our ciphertext is an integer 
rather than a vector. We assume that our scheme is secure under key dependent 
encryptions, purely to keep the notation simpler; to deal with the more general 
case is immediate from our discussion. 

At a high level we need to define a new algorithm called Recrypt, which takes 
a ciphertext c and re-encrypts it to c new , whilst at the same time removing some 
of the errors in c. Intuitively this takes a “dirty ciphertext” c and “cleans it” to 
obtain the ciphertext c new . 

To do this we augment the encryption key with some additional information, 
by extending the algorithm KeyGen with the following additional operations, 
based on two integer parameters si and s 2. We make use of the fact that we are 
only interested in the coefficients of Z(x) modulo 2 p. 

— Generate si uniformly random integers Bi in [— p, ...,p] such that there 
exists a subset S of S2 elements with 

j&s 


over the integers. 

— Define sk; = 1 if i £ S and 0 otherwise. Notice that only S2 of the bits {sk;} 
are set to one. 

— Encrypt the bits sk; under the somewhat homomorphic scheme to obtain 
c, <— Encrypt(sk,;, PK). 

— The public key now consists of 

PK= (p,a,s 1 ,s 2 ,{c i ,B i } s i U)- 

We can now describe the re-encryption operation. 

Recrypt(c, PK): This algorithm takes as input a “dirty” ciphertext c, and then 
produces a “cleaner” ciphertext c new of the same message, but with less “errors” 
in its randomization vector. The re-encryption works by performing a homomor¬ 
phic decryption on an encryption of the ciphertexts bits. In the Appendix we 
explain the Recrypt algorithm in detail and analyse precisely how complicated 
it is for possible real life values. 

Note that we have 

Si 

b = J2 sk * • B *» 

2 = 1 

hence we will now require that this additional information in the public key does 
not compromise the security of the scheme. Gentry reduces this security issue 
to the decisional version of the sparse subset-sum problem (SSSP), and hence 
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the same assumption needs to be made in our situation. The SSSP problem is 
believed to take at least yj (**) > (si/s2) S2 ^ 2 steps to solve, assuming we are not 
in a low density subset sum, i.e. Si/logp > 1. If we take si to be slightly greater 
than logp, then we need to select S2 such that 



> 2 */ £ , 


so as to ensure that the SSSP difficulty is at least as difficult as the difficulty of 
the BDDP underlying the somewhat homomorphic scheme. 

6 Extension to Large Message Space 

We now show that our scheme provides for a more powerful fully homomorphic 
scheme than that of Gentry. In [7 the fully homomorphic property can only 
be applied to single bit messages, since the Recrypt algorithm for full size mes¬ 
sages is relatively complicated. We shall show we can obtain fully homomorphic 
encryption on IV-bit messages and then discuss what this actually means. 

First return to our basic scheme. We alter the KeyGen algorithm to output the 
whole polynomial Z(x) = YliLo 1 z i xl modulo 2 p as the secret key as opposed to 
the single term B. Let the resulting polynomial be denoted B(x ) = Y^iLo 1 biX% • 
Encryption is now modified to take any message from the space £>+ jy(2), he. 
any binary polynomial of degree less than N. Decryption is then performed 
coefficient wise, namely each coefficient rrii of M is recovered by computing 

mi <— (c — [c ■ bi/p ]) (mod 2). 

It is easily seen that this modification results in a somewhat homomorphic 
scheme with the same multiplicative depth as the original scheme. 

We now extend this somewhat homomorphic scheme to a fully homomorphic 
scheme. We write each coefficient of B(x) as a different sum, over a different set 
of indices Si, 

E B u = b >- 

j&Si 

The secret key is now defined to be sk^- = 1 if j £ Si and 0 otherwise. The 
Recrypt algorithm is then immediate. We first apply the Recrypt algorithm as 
above, coefficient wise, to obtain new “cleaner” encryptions of each bit of the 
message, i.e. we obtain 

c« w = Encrypt(?7ii, PK). 

To obtain the encryption of the entire message we simply compute 

JV-l 

Cnew = Encrypt(m, PK) = c^? w • a 1 (mod p). 

2=0 
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Note that recombining the different encryptions causes an extra increase in the 
error term with a factor of Soo- This increase in the error term is due to the 
multiplication, by a l , of the error term underlying Crew- 

Hence, we can obtain fully homomorphic encryption with respect to the alge¬ 
bra F 2 [x\/ ( F ). To see the power of this we need to examine the algebra F 2 [a;]. 
If F(x) splits as nL h (mod 2) with /,; coprime and deg fa = di then by the 
Chinese Remainder Theorem we have 

F 2 [a;]/ (F) = F 2 di x ••• x F 2 d t . 

By concentrating on a single component of the product on the right we therefore, 
by careful choice of F, obtain fully homomorphic encryption in any finite field 
of characteristic two of degree less than N. Furthermore, we could also obtain 
SIMD style homomorphic encryption in multiple finite fields of characteristic 
two at the same time. 

7 Implementation Results 

We now examine a practical instantiation of our scheme. We take the polynomial 
F(x) = x 2 +1, which is always irreducible over the integers. In particular our 
main parameter N is equal to 2", and we have Soo = N. We take = 2 V ^ 
and either ^ = \/N or /i = 2. The case of 77 = 2and /i = \/N are (for 
comparison) also the suggested parameter choices made in [7] (albeit in the 2- 
norm). The case of /i = 2 is chosen to try to obtain as large a depth for the 
somewhat homomorphic scheme as possible. 

Recall that if we write 77/ (2 • VN ■ fa) = 2 e , then the security of our somewhat 
homomorphic scheme is assumed to be 2 JV / e . We then select si = logp and S 2 
to be such that 



which ensures the difficulty of the SSSP is at least 2 N / e . In addition, for our choice 
of F(x), the expected multiplicative depth d for our somewhat homomorphic 
scheme, is estimated by 

d log 2 < log log ( 2 - l°g log(IV • n) ■ 

We present the implications in the following table, for increasing values of n. 


n 

log 2 P 

l L 

2 N/e 

= 2 

S2 

d 

M = 

2 JV/e 

S2 

7 

d 

8 

4096 

2 J5 

5 

0.3 

2 36 

8 

0.0 

9 

11585 

2 31 

6 

0.8 

2 40 

7 

0.3 

10 

32768 

2 41 

7 

1.2 

2 48 

8 

0.8 

11 

92681 

2 54 

8 

1.7 

2 61 

9 

1.2 

12 

262144 

2 73 

10 

2.1 

2 80 

11 

1.6 

13 

741455 

2 100 

12 

2.5 

2 i07 

13 

2.1 
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In the Appendix we make a precise estimate for each value of S 2 what the 
corresponding Recrypt algorithm will produce in terms of the “dirtyness” of 
the ciphertext. This allows us to be able to estimate, for each value of s 2) the 
multiplicative depth d which would be required to obtain a fully homomorphic 
scheme. In the following table we present the value of d required. The given 
value includes the final and-gate to recombine two Recypted ciphertexts. We 
note that we only obtain a fully homomorphic scheme if d < d, so we see that 
for practical values of n our scheme cannot be made fully homomorphic, although 
asymptotically it can be. In fact, in the Appendix we show that for n > 27 it is 
possible to obtain a fully homomorphic scheme. For a given fixed security level 
(and not the maximum possible for a given N and our choice of parameters), it 
should be possible to obtain a slightly lower n. 


S2 

6 

7 

8 

9 

10 

11 

12 

13 

d 

7 

7 

7 

8 

8 

8 

8 

8 


The above estimates are very crude and we refer to the Appendix for a more 
detailed analysis. 

Despite this problem with obtaining a fully homomorphic scheme, we timed 
the various algorithms for the somewhat homomorphic scheme on a desk-top ma¬ 
chine using the NTL library: This was an x86-64 platform, and housed 2.4 GHz 
Intel Core2 (6600) processor cores and used the GCC 4.3.2 C compiler. We were 
unable to generate keys for the parameter size of N = 2 12 , and smaller values of 
N key generation could take many hours. The problem with KeyGen being the 
need to compute many resultants and test the resulting number for primality. 
This is because the output of the resultant calculation will have log 2 p bits, so 
not only are we working with huge numbers; we also have little chance that this 
number is prime on any one iteration. A more general version of KeyGen would 
allow for non-prime, but squarefree resultants. But even in this case obtaining 
keys for say n = 15 seems daunting. We thus do not present times for the KeyGen 
algorithm. The times (in milli-seconds), and the actual value of d computed for 
the specific key, are presented in the following table: 


n 

Encrypt 

Decrypt 

Mult 

fi = 2 

II 

ir 

4.2 

0.2 

0.2 

1.0 

0.0 

9 

38.8 

0.3 

0.2 

1.5 

1.0 

10 

386.4 

0.6 

0.4 

2.0 

1.0 

11 

3717.2 

3.0 

1.6 

2.5 

1.5 


We see that in practice our scheme appears to obtain a better depth of decryp¬ 
tion circuit than theory predicts, although still not deep enough to enable fully 
homomorphic encryption; at least at practical key sizes. 
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A Analysis of the Recrypt Procedure 

In this appendix we explain exactly how Gentry’s re-encryption circuit is im¬ 
plemented in the context of our scheme. We first decrease the size of rpec by a 
factor of two to ensure that the floating point number obtained in the decryption 
procedure is within 1/4 of an integer, i.e. we know that 


• B/p &]x — 1/4, x+ l/4[, 
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for some integer x. Since we are only interested in the result modulo 2, we can 
actually compute 


sk;(c • B t mod 2 p)/p 

_i -1 


(mod 2) . 


As such we are adding up a subset of S 2 values out of si floating point values, 
all of which lie in the range [0 ,..., 2). 

To keep the Recryption method manageable we need to minimize the precision 
of the floating point numbers (c • Bi mod 2 p)/p that we work with. First note 
that if we truncate these values to a fixed precision and make a maximum error 
of < 1/2, then we can still recover the correct result since the approximated 
sum will be in the interval ]x — 3/4, x + l/4[, and any number in this interval 
determines x uniquely. More precisely, if we obtain bits eo, e\ and e 2 such that 
the sum computed with fixed precision is given by 

6 q + 2 1 • ei + 2 • e2 + • • • , 


then the final output is given by 

(e 0 + ei + e 2 + ei • e 2 ) (mod 2). 

The above equation is derived from examining the four possible cases corre¬ 
sponding to the values of e\ and e 2 ; 


ei 

e -2 

Output 

0 

0 

eo 

1 

0 

eo + 1 (mod 2) 

0 

1 

eo + 1 (mod 2) 

1 

1 

eo + 1 (mod 2) 


Assume we work with t bits of precision, i.e. each floating point number in 
[0,..., 2) is represented as Xo=o e,:2 _ *. Then since only S 2 numbers are non¬ 
zero, the maximum error in the total sum is given by 


= s 2 2- t+1 . 


As such we need to choose t such that S 2 • 2~ t+1 < 1/2 which implies that 
t = flog 2 S 2 I + 2. We also define s to be the number of bits to represent all integers 
up to S 2 , i.e. s = [log 2 S 2 J + 1. To get some idea of the practical implications of 
these two values in what follows we give the following table: 


S2 

s t 


S2 

s t 

5 

3 5 


10 

4 6 

6 

3 5 


11 

4 6 

7 

3 5 


12 

4 6 

8 

4 5 


13 

4 6 

9 

4 6 


14 

4 6 




438 


N.P. Smart and F. Vercauteren 


So we see that the value of s is essentially either 3 or 4, and t is either 5 or 6. 

The algorithm re-encryption takes as input a ciphertext c and a public key 
PK = (p, a, Si, S 2 , {Cj, and consists of the following distinct phases: 

1. Write down the first t, bits of the si floating point numbers (c- £?,; mod 2 p)/p 
as an s\ x t matrix (bij) for i = 1,...,si and j = 1,..., t. 

2. Encrypt each of the bits under the public key PK to obtain an s± x t 
matrix of clean ciphertexts (cjj). 

3. Multiply each row of the matrix by the corresponding encryption c* of sk,; to 
obtain (cj • Cij) mod p. As such we obtain the encryption of a matrix with 
only s 2 non-zero rows. 

4. Compute the sum of each column as the Hamming weight using symmetric 
polynomials and hence reduce the sum of si floating point values to the sum 
of t floating point values of t, bits of precision. More precisely, denote by hij 
the j -th bit of the Hamming weight of the i-th column for i = 1 ,,t and 
j = 1,... ,s and form the t x t matrix ( Hij ) with Hij = hij-j+ s whenever 
the right hand side is defined and zero otherwise. 

5. Merge rows of the matrix H , so as to obtain an s x t matrix H' such that 
the sum of the rows of H' equals the sum of the rows of H. 

6 . Apply carry-save-adders to progressively reduce the matrix to one with two 
rows. Each set of three rows is reduced to two, and then this procedure is 
repeated. 

7. Perform the final addition, and output the encryption of a single bit. 

It is perhaps worth recalling that for our scheme we have si « log 2 p and S 2 is 
chosen so that (**) > 2 T for our required security level r, which itself defines 

the parameter N. The value p is approximately equal to 2 JV ' v/ ^, thus si is very 
large indeed. Ciphertexts are integers modulo p and each valid decryptable ci¬ 
phertext can be considered to lie in ball of a given radius. A “clean” ciphertext 
lies in a ball of radius p + 1 and as we add/multiply ciphertexts this radius 
increases, with the ciphertext becoming increasingly “dirtier”. Recall that we 
have the following behaviour: Let c\ and C 2 denote two ciphertexts, correspond¬ 
ing to two randomizations C\(x) = M\ + Ni(x) and = M 2 + -/V 2 (:r); 

where Mj £ {0,1} are the messages and Ni(x) £ — 1) is the ran¬ 

domness, i.e. Ci(x) € #oo,ivFor a ciphertext c, denote with rad(c) the ra¬ 
dius of the ball containing the corresponding polynomial C(x), i.e. we have 
C(x) £ yBoo,jv(r ad(c)). Then 

rad(ci + c 2 ) = rad(ci) + rad(c 2 ), 
rad(ci • C 2 ) = Soo ■ rad(ci) • rad(c 2 ). 

We will now analyze the growth of the error terms during each of the phases of 
the re-encryption. Recall that for our choice of parameters we have p = \/N, 
boo = A” and sq = N\/N. Therefore define p = \/N, then we will compute 
explicit expressions for the radii of the ciphertexts as a function of p. Recall that 
the notation / ~ g means that limp—,^ //g = 1. If A is a matrix of ciphertexts 
we let rad (A) denote the matrix obtained by applying rad to each entry in A. 
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Stages [I] and [ 2 ] 

The result of the first two stages is that we obtain an si x t matrix A -2 containing 
clean ciphertexts c h j with rad (cij) = p + 1 := V 2 ~ p. 

Stage [3] 

Here we take the clean encryptions of the values of sk^ and multiply through each 
corresponding row to obtain a matrix A 3 , with (rad^s))^- = Soo ■ r 2 := r 3 ~ p 4 
for all i, j. 

Stage [4] 

In this stage, we need to compute the encryption of the sum of the (plaintext) 
bits sk, • bij in each of the columns seperately. Note that the sum is simply the 
Hamming weight of the column, so it suffices to compute the bits of the Hamming 
weight. Furthermore, since only S 2 entries in each column are one, the number of 
bits in the Hamming weight is bounded by s. We let SymPolj(xi,..., x k ) denote 
the *-th symmetric polynomial on the variables x \,..., Xk■ Then the bits of the 
Hamming weight of the bit vector ( 61 ,, bk ) is given by 

(SymPol 2 »-i(&i, ■ ■ ■, bk) (mod 2 ), ..., SymPol 2 o(&i,..., bk) (mod 2 )). 

So for each column of our matrix A 3 we need to compute all the symmetric 
polynomials up to 5 = 2 S ~ 4 . To compute the S symmetric polynomials on 
(the encryptions of) si bits we proceed in a recursive manner, essentially using 
Horner’s Rule to compute the last 5+1 coefficients of the product rii=i(^i' ;r +l)- 
For each column of A 3 we execute the following function to compute the 
(encryptions) of the bits of the Hamming weight of the j-th column for j = 

— Set (si,... ,ss ) <— (0,..., 0). 

— For i = 1,..., Si do 

• For k = min(*, 5),..., 3,2 do 

* s fe <- s k +s k -i ■ A 3 (i,j) (mod p) 

• Si <— Si + A 3 (i,j) (mod p). 

— Return (si,... ,Ss). 

We can also see by analysing the above algorithm how dirty the ciphertexts will 
become. To produce s; we need to sum ( s i 1 ) terms which consist of the multipli¬ 
cation of i of the ciphertexts in A 3 together. If ci,..., cs are eight ciphertexts 
given by entries in A 3 then we have 

rad(ci) ~ p 4 , rad(ci ■ C 2 ) ~ p 10 , rad(ci • • • C 4 ) ~ p 22 , rad(ci • • • cs) ~ p 46 . 
Thus we have 

rad(si) .si ■ - p 7 , rad(s 2 ) '■ ()') ■ p 10 — p 10 / 2 , 

~ ( 4 ) ’ + ~ rad(« 8 ) ~ ■ p‘« ~ p™/ 8 !. 


rad(s4) 
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Given the bits /i 1;7 for i = 1,..., t and j = 1,..., s of the t Hamming weights of 
the t columns, the sum of the resulting floating point numbers represented by 
the rows of H 3 is given by 

EE 2 ^- 

i=i 1 


Since we are only interested in the sum modulo 2, we can see that the above 
sum modulo 2 corresponds to the sum of the rows of the t x t matrix H with 
Hi j = hi t i-j +s whenever the right hand side is defined and zero otherwise. We 
therefore obtain the following matrices H depending on the combination of s 
and t. 


Case (s,f) = (3,5): 
We find 


rad (H) 


Case ( s,t) = (4,5): 
We find 


rad(£0 


l P 7 

0 

0 

0 

°\ 

P 16 /2 

p 7 

0 

0 

0 

p 34 /4! 

p 16 / 2 

P 7 

0 

0 

0 

p 34 /4! 

P 16 /2 

P 7 

0 

V 0 

0 

p 34 /4! 

P 16 /2 

p 7 / 


l P 7 

0 

0 

0 

°\ 

P 16 /2 

p 7 

0 

0 

0 

p 34 /4! 

p 16 / 2 

P 7 

0 

0 

p 70 / 8! 

p 34 /4! 

P 16 / 2 

P 7 

0 

V 0 

p 70 / 8! 

p 34 /4! 

P 16 / 2 

p 7 / 


Case (s,t) = (4,6): 
We find 


rad (H) 


( P 7 

P 16 /2 

p 34 /4! 

p 70 / 8! 
0 

V 0 


0 0 0 0 0 \ 

p 7 0 0 0 0 

p 16 /2 p 7 0 0 0 

p 34 /4! p 16 /2 p 7 0 0 

p 70 / 8! p 34 /4! P 16 / 2 P 7 0 

0 p 70 / 8! p 34 /4! p 16 /2 p 7 


Stage [5] 

We now notice that the entries in each column can be permuted around indepen¬ 
dently. It turns out that it makes more sense, due to the way we will add up the 
columns in Stage 0 to order the column entries so that the dirtyness increases as 
you descend a column. This also allows us to delete rows which consist entirely 
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of zeros. We notice that the resulting matrix H' will be of size s x t. In the three 
examples we do not give the precise permutation of the columns, as this can be 
deduced from the implied permutation of the d values. 


Case (s,t) = (3,5): 
We find 

rad (H‘) 


Case (s,t) = (4,5): 
We find 

rad(JT') 


Case (s,t) = (4,6): 
We find 


p 7 

P 7 

P 7 

0 

P 16 /2 

P 16 /2 

P 16 /2 

P 7 

p 34 /4! 

p 34 /4! 

p 34 /4! 

P 16 /2 

' P 7 

P 7 

0 

0 

P 16 /2 

P 16 /2 

P 7 

0 

p 34 /4! 

p 34 /4! 

P 16 /2 

p 7 

p 70 / 8 ! 

p 70 / 8 ! 

p 34 /4! 

p 16 /2 


rad (H') 


l P7 

P 7 

p 7 0 

0 

°\ 

p 16 /2 

P 16 /2 

P 16 /2 P 7 

0 

0 

p 34 /4! 

p 34 /4! 

p 34 /4! p 16 /2 

P 7 

0 

\P 70 / 8 ! 

p 70 / 8 ! 

p 70 / 8 ! p 34 /4! 

P 16 /2 

d 7 / 


Stage [ 6 | 

We now apply a sequence of carry-save-adders to reduce the number of rows 
down to two. We first apply a single chain of carry-save-adders to add the bits in 
the first three rows of matrix H' which produces two rows as output, where the 
first row simply contains the exor of the three rows and the second row contains 
the sum of all products of two out of three rows. Note, we ignore any overflow 
into the bit position corresponding to the binary weight 2 1 and above. If H' has 
four rows we then append the fourth row to the result and apply another chain 
of carry-save-adders. At the end of this stage we have a matrix A$ of dimension 
2 x t. 


From the above estimates for rad (H 1 ) we can then derive the following estimates 
for rad(A 5 ): 


Case (s,f) = (3,5): 
We find 

rad(A 5 ) 


(p 3A /A\ p 34 /4! p 34 /4! p 16 /2 p 7 \ 
\p 52 /48 p 52 /48 p 25 /2 0 0 ) ' 


Case (s,t) = (4,5): 

We find 

/ p 70 / 8 ! p 70 / 8 ! p 34 /4! p 16 /2 p 7 \ 

\p 106 /4!-8! p 52 /2-4! p 25 /2 0 Oj’ 


rad(A 5 ) 
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Case (s, t) = (4,6): 
We find 


rad(A 5 ) 


( p 70 /8 ! p 70 /8 ! p 70 /8 ! p 34 /4! 

V /° 124 /2 • 4! - 8! p 106 /4! • 8! p 52 /2-4! p 25 /2 


p 16 /2 

0 0 J ■ 


A.l Stage [7| 

The final stage is to add the two remaining rows together, and then use the 
analysis from earlier. More precisely, as before we write the final output as 

6 q + 2 1 • ei + 2 • e2 + • • • , 

where, for i = t — 1 ,..., 0 , 

e; = + (^5)2,1 + Cj+i, 

Cj = ((215)1,1 + (^5)2,1) • Cj+i + (^5)1^ • (A 5 ) 2 ,i. 

where c* = 0. Note that the last two elements on the final row of the above 
matrices are equal to zero, so there are no carries to worry about from these 
columns. This simplifies the above expressions a little. We then obtain in each 
of our cases: 


— Case (s,t) = (3,5): 

rad(e 0 , ei, e 2 , e 3 , e 4 ) = 

— Case (s, t) = (4,5): 

rad(e 0 ,ei,e 2 ,e 3 ,e 4 ) = 

— Case (s,t) = (4,6): 

rad(eo,ei,e 2 ,e 3 ,e 5 ) = 


„61 „34 _16 


(2 • 4!) 2 ’ 2-4!’ 4! ’ 2 ’ 


(2 • 4! • 8!) ’ 8! ’ 4! ’ 2 


n 70 „34 „16 


2(4! - 8!) 2 ’ (2-4!-8!)’ 8! ’ 4! ’ 2 


As discussed earlier we can obtain the result Res of the Recryption procedure 
from the values of eo, e 4 and e 2 , by computing the expression (eo + ei + e 2 + ei-e 2 ) 
(mod 2). This enables us to determine the value of rad (Res) in our three cases 
as follows: 


(s,t) 

(3,5) (4,5) (4,6) 

rad (Res) 

p iib p 133 p 24i 

(2 • 4!) 2 (2-4!-8 !) 2(4! • 8!) 2 
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So using the above method we can Recrypt a ciphertext to obtain a new cipher- 
text whose dirtyness measure is bounded by the radius in the above table. We 
then operate on this ciphertext by applying an addition or a multiplication with 
another similar ciphertext so as to produce a new ciphertext which we then apply 
the Recrypt procedure to again. For this to work we require that the ciphertext 
before Recryption can itself be validly decrypted. This means that we need to 
be able to decrypt a ciphertext with dirtyness measure given by <5oo - rad(Res) 2 . 

In the following table we present the final outcome. For a specific value of 
S 2 we give the values of (s,t) in the algorithm, then we give the value of rad(c) 
which needs to be able to be decrypted to obtain fully homomorphic encryption, 
and then the corresponding minimum value of the “depth” of the circuit. We 
note that this measure of depth is a very crude estimate since it measures the 
number of multiplications in a perfectly balanced circuit consisting solely of 
multiplications, whereas our measure rad(c) is much more precise. 


S 2 

(M) 

rad(c) 

depth 

5,6,7 

(3,5) 

pin 

(2 • 4 !) 4 

7 

8 

(4,5) 

p 05 

7 

(2 • 4! • 8!) 2 

9,10,11,12,13,14 

(4,6) 

p ° 

8 

2 2 (4! • 8!) 4 


Recall that the original rpec is given by 2^/2y/N and thus equal to 2 p /2 p. To 
obtain a fully homomorphic encryption scheme we therefore require that 

2 p 

rad(c) < — , 

4 P 

where the extra factor of 2 comes from the fact that we made rpec smaller by 
a factor of 2. It is easy to see that this bound is not attained for the practical 
parameter sizes given in Section [7] A complete similar analysis for the case 
(s,t) = (5,7) gives a radius rad(c) of 

p 880 

rad(c) = (8!) 2 • (4! • 16!) 4 

which shows that for p > 11680 it is possible to obtain a fully homomorphic 
encryption scheme. This corresponds to N > 136422400 or thus n — 27. 
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Abstract. Sanitizable signatures allow a designated party, called the 
sanitizer, to modify parts of signed data such that the immutable parts 
can still be verified with respect to the original signer. Ateniese et al. (ES- 
ORICS 2005) discuss five security properties for such signature schemes: 
unforgeability, immutability, privacy, transparency and accountability. 

These notions have been formalized in a recent work by Brzuska et 
al. (PKC 2009), discussing also the relationships among the security no¬ 
tions. In addition, they prove a modification of the scheme of Ateniese 
et al. to be secure according to these notions. 

Here we discuss that a sixth property of sanitizable signature schemes 
may be desirable: unlinkability. Basically, this property prevents that one 
can link sanitized message-signature pairs of the same document, thus 
allowing to deduce combined information about the original document. 

We show that this notion implies privacy, the inability to recover the 
original data of sanitized parts, but is not implied by any of the other 
five notions. We also discuss a scheme based on group signatures meeting 
all six security properties. 

1 Introduction 

For a regular signature scheme any modification of the message makes the sig¬ 
nature for the modified message invalid. In some applications, though, it may 
be preferable to support message modifications such that one can still verify the 
authenticity of the immutable message part, and that only authorized parties 
can make such changes. Signature schemes having this property are called sani¬ 
tizable , as introduced by Ateniese et al. [I]. Related concepts have been discussed 
concurrently in 1231231201 . 

Ateniese et al. [T discuss the applicability of sanitizable signatures to anony¬ 
mization of medical data, replacing commercials in authenticated media streams 
or updates of reliable routing information. They identified five desirable security 
properties for sanitizable signature schemes. Informally, these are: 

Unforgeability. Says that no one except for the honest signer and sanitizer 
can create valid signatures. 

Immutability. Demands that even a malicious sanitizer cannot change message 
parts which have not been marked as modifiable by the signer. 

PRIVACY. Prevents an outsider to recover the original data of sanitized message 
parts. 


P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 444 2010. 

(c) International Association for Cryptologic Research 2010 
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Transparency. Covers the indistinguishability of signatures created by the 
signer or the sanitizer. 

Accountability. Refers to the inability of a malicious signer or sanitizer to 
deny authorship. 

Brzuska et al. [3] define these five properties with game-based approaches for¬ 
mally and relate them, showing that accountability implies unforgeability and 
transparency implies privacy; all other properties are independent. They also 
prove a modification of the scheme by Ateniese et al. [T] to be secure according 
to these five properties. 

Unlinkability. Here we discuss that an additional property may be useful in 
some settings. We call this property unlinkability and motivate it by the follow¬ 
ing example (see also Figure [TJ: Assume that we have signed medical records 
and at some point we anonymize the data by redacting the personal information 
of the patients like names, addresses etc. At some other time, say for revenues 
reasons, we remove the actual medical treatments and leave only the personal 
information. Then one should not be able to link these data through the (san¬ 
itized) signatures and therefore reconstruct the full records. However, previous 
schemes like the one by Brzuska et al. [3] and, for example, the ones in mrmo] 
in fact allow such attacks. They are usually based on chameleon hashes which 
remain unchanged for the sanitization step and thus allow to identify two sani¬ 
tized signatures derived from the same signature through the hash value. Other 
constructions like the one in j23l even come with an explicit document identifier, 
allowing to link sanitized messages easily. 



Fig. 1. Linkability problem 


We hence introduce a formal definition of unlinkability and relate it to the 
previously given notions. It turns out that unlinkability is not implied by any of 
the other properties, but vice versa implies privacy. The reason is that privacy 
prevents an adversary of recovering the original data for sanitized parts, and 
violation of this property also enables the adversary to reconstruct and to link 
messages easily. 
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Construction. We then present a construction of a sanitizable signature scheme 
obeying all six properties, including unlinkability. The idea is fundamentally 
different from previous approaches which usually rely on chameleon hashes. In 
our case the signer first signs the fixed parts with a regular signature scheme. For 
the modifiable parts the signer and the sanitizer use a group signature scheme 
iTTTj , i.e., a signature scheme which allows to sign anonymously on behalf of the 
group but such that a group manager can revoke the identity of the user that 
has signed [5]. In our case the group only consists of the signer and sanitizer, 
and the signer also incarnates the group manager. If the sanitizer later changes 
(some of) the modifiable message parts it can create a new group signature and 
replace the signer’s group signature. 

The anonymity of the group signature scheme in our context guarantees trans¬ 
parency (the indistinguishability of signatures originating from the signer and 
the sanitizer). The possibility to identify a group member by the group manager 
(i.e., the signer in our case) supports sanitizer-accountability, i.e., the ability to 
provide a proof that the sanitizer has created the signature. Signer-accountability 
is provided by the non-frameability of the group signature scheme which prevents 
a malicious group manager (i.e., the signer) from falsely accusing the sanitizer 
to be the source of a signature. Immutability follows from the unforgeability of 
the regular signature scheme for the fixed parts, and unlinkability from the fact 
that the sanitizer signs the entire message from scratch (the signature for fixed 
message parts remains unchanged). 

We remark that the actual construction needs a careful implementation of the 
idea above to make the derived sanitizable signature scheme satisfy all desired 
security properties. This is in particular true since proposed group signature 
schemes in the literature like rfiFfBS'Si'niTTjTmi come with varying security fea¬ 
tures and set-up assumptions. In this version we thus present a simple but not 
necessarily the most practical approach to turn our idea into a secure sanitizable 
scheme, e.g., following the definitions in [3] we do not rely on the fact that pub¬ 
lic keys of the signer or sanitizer are registered, although this is most likely in 
practice. In the full version we discuss further variations, e.g., multiple sanitiz¬ 
ers, or using a ring signature scheme instead of a group signature scheme, thus 
dropping the accountability requirement for the derived sanitizable scheme. 

Our solution shows that, in general, sanitizable signatures can be built from 
group signatures, thereby providing a new application for the latter primitive. 
This relation also immediately gives a feasibility result for sanitizable signatures: 
Since the work by Bellare et al. [5] about group signatures proves that one can 
derive them from IND-CCA secure encryption, non-interactive zero-knowledge 
proofs and digital signatures, all known to exist given trapdoor permutations, 
it follows that one can also build secure sanitizable signatures from trapdoor 
permutations. 

Organization. In Section [2] we introduce the notion of sanitizable signatures and 
the security properties given in |1I3| . In Section[3]we discuss the notion of unlinka¬ 
bility and its relationship to the other security properties. In Section [3] we present 
our construction of a secure sanitizable scheme based on group signatures. 
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2 Preliminaries 

In this section we revisit the notion of sanitizable signatures and the previously 
given security properties. 

2.1 Sanitizable Signatures 

In a sanitizable signature scheme both the signer and the sanitizer hold a key 
pair (sksig, pk sig ), (sk san ,pk s an ) such that the signer can sign messages with its 
secret key sk s i g and “attach” a description of the admissible modifications ADM 
which are allowed to the sanitizer pk san . The sanitizer can then later change such 
a message according to some modification MOD and update the signature using 
his secret key sfc san . In order to settle disputes about the origin of a message- 
signature pair the algorithm Proof enables the signer to produce a proof n from 
previously signed messages that a signature has been created by the sanitizer. 
This proof can then be verified with the help of the Judge algorithm (but which 
only needs to decide about the origin in case of a valid message-signature pair 
in question; for invalid pairs such decisions are in general impossible). 

To model admissible modifications we assume that ADM and MOD are (descrip¬ 
tions of) efficient deterministic algorithms such that MOD maps any message m 
to the modified message m! = MOD (to), and ADM (mod) £ {0,1} indicates if the 
modification is admissible and matches ADM, in which case ADM(mod) = 1. For 
example, for messages m = m[l] ... m[k) divided into blocks m[i ] of equal bit 
length t we can let ADM contain t and the indices of the modifiable blocks, and 
MOD then essentially consists of pairs (j. m!\j ]) defining the new value for the 
j-th block. 

For ease of notation we let FIX ADM be an efficient deterministic algorithm which 
is uniquely determined by ADM and which maps to to the immutable message 
part Fix ADM (?n), e.g., for block-divided messages FIX adm (to) is the concatenation 
of all blocks not appearing in ADM. To exclude trivial examples we demand 
that admissible modifications leave the fixed part of a message unchanged, i.e., 
fix AD m (to) = FIX adm (mod(to)) for all to £ {0,1}* and all MOD with ADM(mod) = 
1 . Analogously, to avoid choices like fix adm having empty output, we also require 
that the fixed part must be “maximal” given ADM, i.e., fdCad^to') FIX adm (to) 
for m! {mod(to) | MOD with ADM(mod) = 1}. 

Jumping ahead, we note that for our construction based on group signatures 
we make another assumption on ADM. This property, denoted modification- 
decidability, allows to decide efficiently for given messages to, to* and ADM 
whether to* is an admissible modification of m with respect to ADM or not. 
This property is for example satisfied for the block-based approach. However, 
for our definitions of the security properties and their relationships we do not 
impose any restriction at this point. 

The following definition is taken from [3]: 

Definition 1 (Sanitizable Signature Scheme). A sanitizable signature 
scheme SanSig consists of seven efficient algorithms ( KGen S i g , KGen san , Sign, Sanit, 
Verify , Proof, Judge) such that: 


448 


C. Brzuska et al. 


Key Generation. There are two key generation algorithms, one for the signer 
and one for the sanitizer. Both create a pair of keys, a private key and the 
corresponding public key: 


(. pk sig , sk slg ) <- KGen sig ( l n ), ( pk san , sk san ) <- KGen san ( 1") 

Signing. The Sign algorithm takes as input a message m £ {0,1}*. the secret 
key sk S i g of the signer, the public key pk san of the sanitizer, as well as a 
description ADM of the admissibly modifiable message parts. It outputs a 
signature (or _L, indicating an error): 

a <— Sign(m, sk S i g ,pk san , adm). 

We assume that ADM is recoverable from any signature a y^J_. 

Sanitizing. Algorithm Sanit takes a message m £ {0,1}*, a signature a, the 
public key pk sig of the signer and the secret key sk san of the sanitizer. It 
modifies the message m according to the modification instruction MOD and 
determines a new signature a' for the modified message m' = MOd(to). Then 
Sanit outputs m! and o' (or possibly _L in case of an error). 

(to', a) Sanit(m , MOD, cr, pk sig , sk san ) 

Verification. The Verify algorithm outputs a bit d £ {true, false} verifying 
the correctness of a signature o for a message m with respect to the public 
keys pk sig and pk san . 


d <— Verify(m, a, pk sig ,pk san ) 

PROOF. The Proof algorithm takes as input the secret signing key sk S i g , a mes¬ 
sage to and a signature o as well a set of (polynomially many) additional 
message-signature pairs ( mi , Ci)i=i,2,...,g and the public key pk san . It outputs 
a string 7r £ {0,1}*: 

7 r <— Proof{sk S i g , to, cr, (mi, or),..., ( m q , <r q ),pk san ) 

Judge. Algorithm Judge takes as input a message m and a valid signature a, the 
public keys of the parties and a proof n. It outputs a decision d £ {Sig, San} 
indicating whether the message-signature pair has been created by the signer 
or the sanitizer: 

d <— Judge(m, a , pk sig , pk san , w) 

For a sanitizable signature scheme the usual correctness properties should hold, 
saying that genuinely signed or sanitized messages are accepted and that a gen¬ 
uinely created proof by the signer leads the judge to decide in favor of the signer. 
For a formal approach to correctness see [3]. 
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2.2 Security of Sanitizable Signatures 

Here we recall the security notions for sanitizable signatures given by Brzuska 
et al. [3j. We note that, there, they show that signer and sanitizer accountability 
together imply unforgeability, and that transparency implies privacy. Hence, in 
principle it suffices to show immutability, accountability and transparency. We 
therefore omit the formal definitions of unforgeability and privacy here and refer 
the reader to the full version of the paper. 


Immutability. This property demands informally that a malicious sanitizer can¬ 
not change inadmissible blocks. In the attack model below the malicious sani¬ 
tizer A interacts with the signer to receive signatures a , for messages mj, de¬ 
scriptions ADMi and keys pk san . of its choice, before eventually outputting a 
valid pair (pfc* an ,m*, a*) such that message m* is not a “legitimate” trans¬ 
formation of one of the mi’s under the same key pk* san = pk san ... The lat¬ 
ter is formalized by requiring that for each query pk* an yf pk san or m* 
(MOD(TOi) | MOD with ADM.;(mod) = 1} for the value ADM^ in <7j, e.g., that for 
block-divided messages m* and m; differ in at least one inadmissible block. As 
the adversary can query the signer for several sanitizer keys pk san , the security 
definition also covers the case where the signer interacts with several sanitizers 
simultaneously. 


Definition 2 (Immutability). A sanitizable signature scheme SanSig is im¬ 
mutable if for any efficient algorithm A the probability that the following exper¬ 
iment lmmutability^ lnSls (n) returns 1 is negligible (as a function ofn). 


Experiment Immutability ^ 11 nSig (n) 

(pksigi sksig’j < KGen S i g (l ) 

( pk* an ,m*,a*) <- 

letting (to*, ADM^, pk san f) and cq for i = 1,2,..., q 
denote the queries and answers to and from oracle Sign, 
return 1 if 

Verifyfm* , a* , pk sig , pk* san ) = true and 
for all i = 1 , 2 ,... ,q we have 

PKsan 7^ P^san i or 

m* {MOD(mj) | mod with adm^mod) = 1 } 


Accountability. Accountability says the origin of a (sanitized) signature should 
be undeniable. There are the following two types of accountability: sanitizer- 
accountability says that, if a message has not been signed by the signer, then 
even a malicious sanitizer should not be able to make the judge accuse the 
signer. Signer-accountability says that, if a message and its signature have not 
been sanitized, then even a malicious signer should not be able to make the judge 
accuse the sanitizer. 

In the sanitizer-accountability game let Asanit be an efficient adversary playing 
the role of the malicious sanitizer. Adversary Asanit has access to a Sign and Proof 
oracle. Her task is to output a valid message-signature pair m*, a* together with 
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a key pk* san (with (pfc* an ,m*) being different from pairs ( nii,pksani ) previously 
queried to the Sign oracle) such that the proof produced by the signer via Proof 
still leads the judge to decide “Sig”, i.e., that the signature has been created by 
the signer. 


Definition 3 (Sanitizer-Accountability). One calls a sanitizable signature 
scheme SanSig sanitizer-accountable if for any efficient Asanit the probability 
that the following experiment San-Accountability^ S,g (n) returns 1 is negligible 
(as a function ofn). 


Experiment Sa n-Accou nta bil ity^" S ' g (n) 

(pksigi sk S ig) < KGen S ig{\ ) 

.. >ww 

letting (m*, ADM,, pk san f) and Oi for i = 1, 2,..., q 
denote the queries and answers to and from oracle Sign 


7r <— Proof[sk S ig , m*, a*, (mi, ay),..., (m q , <j q ),pk* san ) 
return 1 iff 


( PKan > m *) 7 ^ (P^san »> to,) /or all i = 1 , 2, ..., q, and 
Verify(m*, a*, pk sig , pk* san ) = true, and 
Judge(m*,a*,pk sig ,pk* an , n) = Sig 


In the signer-accountability game a malicious signer Asi gn gets a public sanitizing 
key pk san as input. She is allowed to query a sanitizing oracle about tuples 
(nii i MOD,;, <t,, pfc sig .) receiving answers (to', er'). Adversary Asi gn finally outputs 
a tuple (pfc* ig , to*, £7*) and is considered to succeed if Judge accuses the sanitizer 
for the new key-message pair pfc* ig , ?n* with a valid signature er*. 

Definition 4 (Signer-Accountability). A sanitizable signature scheme 
SanSig is called signer-accountable if for any efficient Asign the probability that 
the following experiment Sig-Accountability^f g ( g (n) returns 1 is negligible (as a 
function ofn): 

Experiment Sig-Accountability^ s ' g (n) 

(pk S ani S ksan) <— KGen san (l n ) 

(pk* S ig’ to*, er*) 4 - A S s *™ t{ ' , ' , ' , ' ,skaan) (pk aan ) 
letting (?n', er') /or i = 1,2,.. ., q 
denote the answers from oracle Sanit. 
return 1 iff 

(pk* sig , m*) (pk sig to') for all i = 1 , 2 ,..., q, and 
Verify(m* , er*, pk* ig , pk san ) = true and 
Judge(m* , er*, pk* sig , pk san , w*) = San 


Transparency. We define transparency by the following adversarial game. We 
consider an adversary A with access to Sign, Sanit and Proof oracles with which 
the adversary can create signatures for (sanitized) messages and learn proofs. In 
addition, A gets access to a Sanit/Sign box which contains a secret random bit 
b £ {0,1} and which, on input a message to, a modification information MOD 
and a description ADM 
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— for 6 = 0 runs the signer algorithm to create er <— Sign(rn, sk s i g , pk sig , ADM), 
then runs the sanitizer algorithm and returns the sanitized message m! with 
the new signature er', and 

— for 6=1 acts as in the case 6 = 0 but also signs m' from scratch with the 
signing algorithm to create a signature o' and returns the pair (in ', a'). 

Adversary A eventually produces an output a, the guess for 6 . A sanitizable 
signature is now said to be transparent if for all efficient algorithms A the prob¬ 
ability for a right guess a = 6 in the above game is negligibly close to Below 
we also define a relaxed version called proof-restricted transparency and discuss 
the idea after the definition. 

Definition 5 ((Proof-Restricted) Transparency). A sanitizable signature 
scheme SanSig is (proof-restricted) transparent if for any efficient algorithm 
A the probability that the following experiment Transparency^ anSls (n) returns 1 is 
negligibly close to i. 

Experiment Transparency^ nS ' s (?i) 

(P^sigi S^sig') ‘ KGen s i g (l ) 

(pksam skscm) <— KGen san (l n ) 

& <-{ 0 , 1 } 

q ^ _ ^Sign(-,sk s ig,-,Sanit(-,sk san ) iProof(sk s ig,...,-),Sanit/Sign(-,sk s ig,sk sa niPk sig ,pk san ,b) 

with input (pk sig ,pk san ) 

where oracle Sanit/Sign for input mu, MOD*,, ADMjt 
first computes au <— Sign(mk,sk s i g ,pk san ,XDMk), 
then computes (m' k ,a' k ) <— Sanit(mk, MOD^, <7*,, pk sig , sk san ), 
then, if b = 1, replaces a' k by a' k <— Sign(m' k , sk S i 9 , pk san , ADM&), 
and finally returns ( m' k , a' k ). 
return 1 iff 
a = 6 

(and, in the proof-restricted case, A has not queried 
any m' k output by Sanit/Sign to Proof) 

The original definition of Brzuska et al. [5] does not consider the proof-restricted 
case. Without this restriction, though, achieving transparency at first seems to 
be impossible because the adversary can then always submit the replies of the 
Sanit/Sign oracle to the Proof oracle and thereby recover the secret bit 6 . How¬ 
ever, in their construction the Proof algorithm searches in the list of previously 
signed messages and only gives a useful answer if it finds a match, enabling 
transparency without this restriction. Yet, any solution (like ours here) where 
the Proof algorithm is “history-free” can only achieve the proof-restricted ver¬ 
sion. Note that Proof algorithms forgoing the set of previously signed messages 
are preferable from an efficiency point of view, of course. 

As for the implications among the security notions [3] we note that proof- 
restricted transparency only implies a proof-restricted form of privacy, where the 
answer messages of the LoRSanit oracle cannot be submitted to the Proof oracle 
either. However, since we show in the next section that unlinkability implies full 
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privacy and our construction achieves unlinkability, our scheme is also private 
in the non-restricted sense. We note that all the separation results in [5] remain 
valid for proof-restricted transparency. 

3 Unlinkability 

In this section we define unlinkability formally and discuss its relationship to the 
other security notions. 

3.1 Definition 

As explained in the introduction, unlinkability refers to the impossibility to 
use the signatures to identify sanitized message-signature pairs originating from 
the same source. Technically, we use an indistinguishability-based approach to 
define this property, saying that, given a signature for a sanitized message of two 
possible sources, the adversary cannot predict the actual original message better 
than by guessing. This should even hold if the adversary herself provides the two 
source message-signature pairs and modifications of which one is sanitized. The 
stipulation here is that the two modifications yield the same sanitized message. 
Else, if for example the sanitized messages still contain some unique but distinct 
entry, then predicting the source is easy, of course. This, however, is beyond the 
scope of signature schemes: the scheme should only prevent that signatures can 
be used to link data. 

Formally, we use a game-based approach to define unlinkability, similar to the 
other security notions in [3]. The adversary is given access to a signing oracle 
and a sanitizer oracle (and a proof oracle since this step depends on the signer’s 
secret key and may leak valuable information). The adversary is also allowed to 
query a left-or-right oracle LoRSanit which is initialized with a secret random 
bit b. In each of the multiple queries to LoRSanit the adversary provides a pair 
of tuples, each consisting of a message, a modification and a valid signature, 
such that the recoverable description of admissible modifications is identical in 
both cases (since we assume that ADM is recoverable from a signature providing 
distinct descriptions ADM would allow a trivial attack; so would the case that only 
one signature is valid). Depending on the bit &, the adversary gets the sanitized 
message-signature pair of either the left or right input pair. The adversary should 
eventually predict the bit b significantly better than with the guessing probability 
of ± 

Definition 6 (Unlinkability). A sanitizable signature scheme SanSig is un- 
linkable if for any efficient algorithm A the probability that the following exper¬ 
iment Unlinkability^ anS ' s (n) returns 1 is negligibly close to ^. 

Experiment U n I i n ka bi I ity^ anSlg (n) 

(P^sigi S^sig') ‘ KGen s ig(l ) 
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^ 4 _ j^Sign(sk s i g ,---),Sanit(sk san ,---),Proof(sk s i g ,---),LoRSanit(sk a i g ,sk san ,b,---) j 

where oracle LoRSanit(-,-,-,sk S ig,sk sa n,b), on input 
{mjfi, MODyo, (Jjfipnj'i, MODyi, <7j,i) with recoverable ADMyo = ADMyi 
Verify(mj 0 , aj t o, pk sig , pk san ) = true, Verify(mj 1 , pk sig , pk san ) = true, 
returns {m! gl a' g ) <— Sanit{mj^,MO'Djj ) ,(jjj ) ,pk sig , sk san ), 
and where (to^o, MOD^o, ADM^o) = (m^^MOD^^ADM^i), 
i.e., are mapped to the same modified message, 
return 1 if a = b. 

A pictorial description is given in Figure O We note that the definition above is 
for example robust concerning several sanitization steps in the LoRSanit oracle. 
That is, we could allow the adversary in the experiment above to submit ar¬ 
bitrarily long “modification chains” mod! 0 ,..., MOD™ 0 and MOD* 1; ..., MODj\ 
such that the two source documents are gradually sanitized with a match in the 
resulting documents. Still, predicting b remains hard, as such chains can poten¬ 
tially be simulated by calling the sanitizer oracle for the first to — 1 modifications 
manually, before entering the final sanitization step into the LoRSanit oracle. 



a 


Fig. 2 . Unlinkability. A wins if it outputs a = b. 

Recall the example of medical records which are sanitized twice, one time 
basically removing the personal information and the other time removing the 
medical data. Our notion of unlinkability can then be used to show that such 
sanitized message-signature pairs do not allow to reconstruct the full data better 
than by guessing. Assume for simplicity that we only have two records with 
entries (name#0, data#0) and (name#l, data#l). Then we create all four possible 
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combinations (name#a, data#6) for a,b £ { 0 ,1} and ask for signatures for them 
(with both parts being admissibly changeable). For each a £ {0,1} we then 
insert the pairs (name#a, data#0) and (name#a, data#l) twice into the LoRSanit 
oracle, one time cutting off the name-part, the other time removing the data-part. 
Altogether we make thus four calls to the LoRSanit oracle, and we hand those 
four replies to the adversary. Our unlinkability definition says that one cannot 
distinguish the two cases (left or right sanitization) better than by guessing, thus 
also disallowing to tell which data belong to whose name. 

Our definition above is for unlinkability with respect to message-signature 
pairs sanitized by the same sanitizer. One can easily extend the above defini¬ 
tion by demanding that the adversary can also determine different sanitizers for 
the left and for the right input data. But then both sanitizers must have been 
declared to have the permission to sanitize, otherwise one could easily deter¬ 
mine the secret bit of the LoRSanit by picking an invalid sanitizer for one of the 
input tuples. 

3.2 Relationships of the Security Notions 

We first show that unlinkability does not follow from any of the other security re¬ 
quirements. Then we prove that unlinkability implies privacy, and finally discuss 
that unlinkability does not imply any of the other properties. 

Proposition 1 (Independence of Unlinkability). Assume that there exists 
a sanitizable signature scheme (obeying one or more of the properties unforgeabil¬ 
ity, immutability, privacy, (proof-restricted) transparency, signer-accountability 
and sanitizer-accountability). Then there exists a sanitizable signature scheme 
which is not unlinkable but preserves the other security properties. 

The proof follows by simply appending a unique identifier id to each signature. 
This does not destroy any of the other security properties but clearly violates 
unlinkability. The proof of the following is straightforward as the privacy ex¬ 
periment is essentially the unlinkability experiment with less control for the 
adversary: 

Proposition 2 (Unlinkability Implies Privacy). Any unlinkable sanitizable 
signature scheme is also private. 

With the next proposition we show that unlinkability does not imply any of 
the other security properties (assuming that we start with a secure sanitizable 
signature scheme like the one we construct in the next section): 

Proposition 3 (Independence of Other Properties). Assume that there 
exists a sanitizable signature scheme which is unforgeable, immutable, private, 
(proof-restricted) transparent, signer-accountable, sanitizer-accountable and un¬ 
linkable. Then for any of the properties immutability, transparency, unlinkability, 
signer-accountability and sanitizer-accountability, there exists a sanitizable sig¬ 
nature scheme obeying all properties except for the one in question. 
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Proof. The fact that unlinkability does not follow from the other properties has 
already been shown in Proposition [l] For the other properties we remark that 
the counterexamples in [3] which seperate immutability, transparency, signer- 
accountability and sanitizer-accountability from the other properties also pre¬ 
serve unlinkability in each case (and also hold for proof-restricted transparency). 

□ 


4 Constructions Based on Group Signatures 

In this section we present our unlinkable sanitizable signature scheme (which also 
satisfies the other security properties). As explained in the introduction, the idea 
is to use a group signature scheme for the group consisting of the signer and the 
sanitizer, such that the signer signs the immutable message part with a regular 
signature scheme and the full message with the group signature scheme. The 
sanitizer can then update the full message and only sign this second component. 
The signer also takes over the role of the group manager in order to provide 
accountability. 

4.1 Group Signatures 

Group signatures, introduced by Chaum and van Heyst [13] , allow a set of users 
to sign on behalf of the group such that outsiders cannot distinguish between dif¬ 
ferent signers (anonymity) but such that a group manager can trace the signer’s 
identity (traceability). We follow the formal framework of Bellare et al. [5] but 
add the non-frameability requirement pj] that even the group manager cannot 
sign on behalf of the users. Recall that this is necessary for the accountability 
in our sanitizable signature scheme, where the signer acts as the group manager 
and should not be able to make the judge falsely accuse the sanitizer. 

We briefly recall group signature schemes and their security properties. For 
comprehensive definitions see the full version of the paper and [5]. A group 
signature scheme GS consists of six efficient algorithms GS = (GKGen, UKGen, 
GSig, GVf, Open, GJudge) where 

— (sfc use r,pfc user ) <— UKGen(l") generates individual user key pairs, 

— (gmsk, gpk, cert) <— GKGen(l n , gpk user ) takes the tuple gpk user of the 
users’ public keys and generates a group manager secret key gm.sk , a group 
public key gpk and an individual certificate certi for each user, where cert 
designates the tuple of all certi , 

— a <— GSig (sk usel 'i, certi, gpk, m) signs a message rri given the user’s secret 
data sfc user ,i, certi and the group’s public key gpk , 

— (z, 7r) <— Oper\(gmsk, m, cr, gpk user , gpk) on input a message m and signature 
a returns the index i of the alleged signer and a proof 7r such that 

— GJudge(w, cr, z, 7r, gpk, gpk useT .) either confirms the accusation or denies it. 

There are three security properties for group signatures (5|SI : 
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Anonymity. Means that one cannot tell from a group signature who signed a 
message, even if one knows the secret data of the user and can ask the group 
manager to reveal the identities for other signatures. 

Traceability. Refers to the fact that a malicious user cannot falsely accuse 
an honest user to be the signer of a message, even if the malicious user is 
allowed to see other signatures generated by this honest user and can call 
the group manager. 

Non-Frameability. Strengthens traceability in the sense that even if the ma¬ 
licious user colludes with the group manager they cannot frame an honest 
user. 

Definition 7 (Secure Group Signature). We call a group signature scheme 
secure if it is anonymous and non-frameable. 

Note that we tailor the group signature definitions to our needs thereby adding 
non-frameability, making the scheme syntax setup session free and relaxing the 
security model concerning some technical issues which are discussed in the full 
version of this paper. As for instantiations we remark that the (generic) construc¬ 
tion by Bellare et al. [5] satisfies our adapted definitions. As for more efficient 
group signature schemes, we can implement our sanitizable signature scheme 
with other group signature schemes like |22I14I18IT!?] . Yet, these group signa¬ 
ture schemes need additional set-up assumptions like a trusted party generating 
common parameters or interactive registration of users. Our sanitizable signature 
scheme then inherits these characteristics (recall that, in practice, registration 
of signer and sanitizer keys is for example necessary to provide meaningful ac¬ 
countability) . 

4.2 Construction 

In this section we show that the new security requirement of unlinkability can be 
achieved in combination with the five established security properties formalized 
in [3]. Recall that we sign the entire message, including the modifiable parts, with 
the group signature scheme, and —in order to prevent inadmissible changes— 
the signer also signs the fixed part with a regular scheme. This requires some care 
because if we take an arbitrary signature scheme then the signature itself may 
act as a unique identifier, even for messages with identical fixed parts. Thereby, 
unlinkability would be violated. 

The solution is to use a secure deterministic signature scheme for the fixed 
part (such that the signature is identical for messages with the same fixed part). 
Alternatively, one can deploy a rerandomizable signature scheme such that the 
sanitizer can rerandomize the signature, excising the link to the input signature. 
Below we use the “deterministic solution” for simplicity, and since every secure 
signature scheme can be easily turned into a deterministic one via pseudorandom 
functions m- 

For a formal definition of strongly unforgeable signature schemes see HZ], 
We need this unforgeability notion (saying that one cannot even find a new 
signature for a previously signed message) to provide unlinkability. Examples of 
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signature schemes achieving this strong notion are mmm■ Moreover, it is 
possible to obtain a strongly unforgeable signature scheme out of any unforgeable 
signature scheme applying the transformation of Bellare and Shoup 0 • Applying 
the transformation of [15] one can then make such schemes also deterministic. 

Recall that the idea behind our scheme is that for each signature the signer 
uses a group manager key, creates a certified user key to sign the modifiable 
parts, and certifies the sanitizer’s public key as a group member to support 
modifications. But since our definition of sanitizable signatures demands state- 
free solutions, the signer formally cannot store the group manager key for this 
sanitizer and would need to create a new one for each call. We bypass this as 
follows: we let the signer for each signing request, including a public key of the 
sanitizer pk san , create the group manager’s keys etc. via the corresponding group 
signature algorithms, but provide the randomness for these algorithms by apply¬ 
ing a pseudorandom function to pk san (see [16] for a definition of pseudorandom 
functions). By this, we end up with (almost) independent keys for different san¬ 
itizers, but use consistent parameters for each sanitizer. For the same reason we 
also include the group membership certificate of the sanitizer in the signature, 
although it would be given directly to the sanitizer instead. As a side effect, since 
the group manager’s public key is tied to the sanitizer in question, we also rely 
on group signatures with static joins only. 

Construction 1 (Sanitizable Signature Scheme). Let S = (SKGen, SSign, 

SVf) be a (regular) signature scheme, let GS = ( GKGen , UKGen , GSig , GVf, Open, 
GJudge) be a group signature scheme. Let VIZJ- = ( KGen pr f, PRF) be 

pseudorandom function. Define the sanitizable signature scheme SanSig = 
( KGen S i g , KGen san , Sign. Sanit, Verify, Proof, Judge) as follows: 

Key Generation. First, algorithm KGen S i g gets the input l n and runs 
( ssk, spk) <— SKGen(\ n ) to create a key pair for the signature scheme, and 
then also invokes k <— KGen pr f(l n ) to derive a key for the pseudorandom 
function. It outputs (sk S i g , pk sig ) = ((ssk, k), spk). Algorithm KGen san (l n ) 
generates a key pair (sk san ,pk san ) = (gsk san , gpk san ) <— UKGen( l n ) of the 
group signature scheme. 

Signing. Algorithm Sign on input m £ {0,1}*, sk S i g = (ssk,k), pk san , adm sets 
m fix = FIX adm (to) for the algorithm FIX adm determined by ADM. It runs the 
user key generation algorithm (gsk sig , gpk sig ) <— UKGen(l n -, PRF(k,0\\pk san )) 
for randomness PRF(k, 0|| pk san ) and afterwards the group manager algorithm 
to compute 

(gmsk, gpk, cert S i g , cert san ) <— GKGen( l n , (gpk sig , pk san )] PRF(k, l||pfc s[m )) 
for randomness PRF(k,l\\pk san ). It computes 

cj F ix = SSign(ssk, (to F ix, adm, pk san , gpk)) and 
O’full = GSig(gsk sig , cert sig , (m, pk sig ),gpk) 

using the signing algorithms of the regular and of the group signature scheme. 
The algorithm finally returns a = (<t F i X , cr FULL , ADM, pk san , cert san , gpk). 
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Sanitizing. Algorithm Sanit on input a message m, information mod, a sig¬ 
nature a = (cr F ix, cr FULL , ADM, pk san , certsan, gpk), keys pk sig and sk san first 
recovers m Frx = FlX ADM (m). It then checks that MOD is admissible according 
to ADM and that cr Flx is a valid signature for message (m FIX , ADM, pk san , gpk) 
under key spk. If not, it stops outputting _L. Else, it derives the modified 
message m! = MOd(to) and computes 

'V'ULL GSig(gsk san , cert san , (m!, pk sig ), gpk) 

and outputs m! together with o' = (a Fix , a FVLL , ADM, pk san , cert san , gpk). 
Verification. Algorithm Verify gets as input a message m £ {0,1}*, a sig¬ 
nature a = (cr F ix, cr FULL , ADM, pk san , cert san , gpk) and public keys pk sig = spk 
and pk san . It first recovers m F K = fix adm (to). It then checks whether SVf(spk, 
(?Tipix, ADM, pk san , gpk), CTpix) = 1 and GVf[gpk, {in, pk sig ), <r FULL ) verifies un¬ 
der the group public key as true, too. If so, it outputs 1, declaring the entire 
signature as valid. Otherwise it returns 0, indicating an invalid signature. 
PROOF. Algorithm Proof gets as input sk S i g , m and a = (cr Flx , cr FVLL , ADM, pk san , 
cert san , gpk) ■ It. parses the key as sk S i g = ( ssk,k ) and recomputes 

{gmsk, gpk ', cert! sig , cert! san ) = GKGen{ 1", (gpk sig ,pk san )\ PRF(k, 1|| pk san )) 

and checks that gpk' = gpk and certf san = cert san (and immediately returns _L 
if not). It next verifies that SVf{spk, (to fix , ADM, pk san , gpk), <t F i X ) = 1 and, if 
so, computes and outputs (i, 7r) <— Open(gmsk,(m,pk sig ),a FVLLl gpk), where 
i £ {Sig,San} is the identity returned by the Open algorithm (or, Proof 
returns _L if any of the verification steps above fail). 

Judge. The judge on input m,a,pk sig ,pk san and a proof (i, 7r) with i £ 
{Sig,San} parses a as (cr Flx , a FULL , ADM, pk san , cert san , gpk). It derives b <— 
GJudge((m, pk sig ), a FULL ,i,Tr, gpk) using the judge algorithm of the group sig¬ 
nature scheme. If b = true it outputs i, else it outputs i = Sig. 

Completeness of signatures generated by the signer and sanitizer follows easily 
from the completeness of the underlying signature schemes and the fact that 
FIXadm leaves the fixed message parts unchanged for modified messages. There 
is a negligible probability that a signature of the signer or the sanitizer also 
verifies under the other party’s other key, yielding possibly a wrong answer from 
the judge. We ignore this issue here for simplicity. 

4.3 Security Proof 

We need an additional property of the admissible modifications ADM: given 
arbitrary messages to, to* £ {0,1}* (and a security parameter 1") it should 
be efficiently decidable whether to* £ {mod(?ti) | MOD with ADM(mod) = 1} or 
not. We call such ADM modification-decidable and a sanitizable signature scheme 
modification-restricted if it only allows modification-decidable ADM. As an exam¬ 
ple consider again block-divided messages where ADM describes the block-length 
and the indices of changeable blocks. Then it is easy to check whether m* has 
been changed in admissible blocks only or not. 
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Theorem 2. Let S be a strongly unforgeable deterministic signature scheme 
and let GS be a secure group signature scheme. Assume further that V1ZT is 
a pseudorandom function. Then the modification-restricted sanitizable signature 
scheme in Construction Q] is unforgeable, immutable, private, proof-restricted 
transparent, accountable and unlinkable. 

As unlinkability implies privacy, and as moreover, sanitizer-accountability and 
signer-accountability imply unforgeability, it suffices to prove these two types of 
accountability as well as with unlinkability, immutability and (proof-restricted) 
transparency. 

For the proof idea note that we can reduce transparency of our sanitizable sig¬ 
natures to anonymity of the underlying group signature scheme. Traceability of 
the group signature scheme enables the group manager (i.e., the signer) to pro¬ 
vide a proof that a message has indeed been signed by a certain group member. 
Thus, if the sanitizer signs a message, the signer can produce evidence that this 
signature originates from the sanitizer. This shows sanitizer-accountability. Vice 
versa, the unframeability property of group signature scheme assures that the 
group manager (i.e., the signer) cannot falsely accuse a group member of having 
signed a message. Therefore, signer-accountability follows from unframeability. 

The unforgeability of the underlying regular signature scheme assures im¬ 
mutability: If the sanitizer changed admissible parts of a message, she would be 
obliged to forge a signature for the fixed part. Unlinkability holds as the sanitizer 
creates a new group signature from scratch when sanitizing. Furthermore, the 
signature of the regular signature scheme remains unchanged, and is identical 
for different documents with the same fixed part because we use a deterministic 
scheme. The formal proof follows these ideas and appears in the full paper. 
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Abstract. Encrypt-and-sign, where one encrypts and signs a message 
in parallel, is usually not recommended for confidential message trans¬ 
mission as the signature may leak information about the message. This 
motivates our investigation of confidential signature schemes, which hide 
all information about (high-entropy) input messages. In this work we 
provide a formal treatment of confidentiality for such schemes. We give 
constructions meeting our notions, both in the random oracle model 
and the standard model. As part of this we show that full domain hash 
signatures achieve a weaker level of confidentiality than Fiat-Shamir sig¬ 
natures. We then examine the connection of confidential signatures to 
signcryption schemes. We give formal security models for deterministic 
signcryption schemes for high-entropy and low-entropy messages, and 
prove encrypt-and-sign to be secure for confidential signature schemes 
and high-entropy messages. Finally, we show that one can derandomize 
any signcryption scheme in our model and obtain a secure deterministic 
scheme. 


1 Introduction 

A common mistake amongst novice cryptographers is to assume that digital 
signature schemes provide some kind of confidentiality service to the message 
being signed. The (faulty) argument in support of this statement is (a) that 
all signature schemes are of the “hash-and-sign” variety, which apply a hash 
function to a message before applying any kind of keyed operation, and (b) that 
a one-way hash function will hide all partial information about a message. Both 
facets of this argument are incorrect. However, it does suggest that notions of 
confidentiality for signature schemes are an interesting avenue of research. 

The question of confidentiality of hash functions in signature schemes was 
previously considered by Canetti [7] as “content-concealing signatures”; however 
the original treatment only serves to motivate the concept of perfect one-way 
hash functions [7)8j . We provide a more formal treatment here. The question 
of entropic security has been considered by several other authors. Dodis and 
Smith studied entropic secure primitives requiring that no function leaks their 
input [T2] • Russell and Wang [22] consider the security of symmetric encryption 
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schemes based on high-entropy messages, and several authors have considered 
the security of asymmetric encryption schemes based on high-entropy messages 
mm- However, we are the first authors to consider the confidentiality of signa¬ 
tures and signcryption schemes in this scenario. 

We believe that the concept of confidential signatures is intrinsically interest¬ 
ing and may prove to be useful in the construction of protocols in which two 
entities need to check that they are both aware of a particular message which 
(a) contains some confidential information, such as a password, and (b) contains 
a high entropy component, such as a confidential nonce. 

Defining Confidential Signatures. Our first contribution is to define confidential 
signatures. Our starting point are high-entropy messages (signatures for mes¬ 
sages with low entropy inevitably leak through the verification algorithm of the 
signature scheme). Our definitions are based on previous efforts for determinis¬ 
tic public-key encryption [Sj, and yield three models for confidential signature 
schemes: 

— Weak confidentiality means that no information is leaked to a passive adver¬ 
sary, except possibly for information related to the technical details of the 
signature scheme. 

— Mezzo confidentiality means that no information is leaked to a passive ad¬ 
versary (in possession of the verification key). Note that this is in contrast 
to deterministic public-key encryption where information cannot be hidden 
in such circumstances [3]. 

— Strong confidentiality means that no information is leaked to an active ad¬ 
versary (in possession of the verification key). 

Our definitions are general enough to cover probabilistic and deterministic sig¬ 
nature schemes, although we need an additional stipulation in the latter case, 
preventing the case where the leaked information is the unique signature itself. 

Relation to Anonymous Signatures. There are similarities between confidential 
signatures and anonymous signatures mE3j. Anonymous signatures hide the 
identity of the signer of a high-entropy message, whereas confidential signatures 
hide all the information about the message itself. The relationship between these 
two primitives is similar to the relationship between anonymous encryption and 
traditional public key encryption. 

Constructing Confidential Signatures. We then show how to obtain confidential 
signatures. We first introduce the related concept of confidential hash functions, 
akin to hiding hash functions [3j. We prove that random oracles are confidential 
hash functions, as are perfectly one-way hash functions [7B8j in a weaker form. 

We then show that the use of weakly confidential hash functions in full do¬ 
main hash (FDH) signature schemes yields weakly confidential signatures. We 
show that FDH signature schemes and Fiat-Shamir signatures are confidential 
in the random oracle model. We also show that strongly secure confidential sig¬ 
natures can be obtained in the standard model via the use of a randomness 
extractor [19120] (provided the message entropy lies above some fixed bound). 
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Applications to Signcryption. Secure message transmission is usually performed 
via the encrypt-then-sign paradigm, where the sender encrypts the message un¬ 
der the receiver’s public encryption key and then signs the ciphertext with his 
own signing key. Signcryption schemes, introduced by [23], aim to gain effi¬ 
ciency by combining the two operations. One consequence of previous security 
definitions m is that the encrypt-and-sign approach, where one encrypts the 
message and signs the message in parallel, does not provide a secure signcryption 
in general as the signature may reveal information about the message. 

We introduce security notions for (possibly deterministic) signcryption 
schemes with high-entropy messages, along the lines of deterministic public-key 
encryption and confidential signatures. In case of signcryption schemes, we can 
also give a low-entropy-message version and show that this definition is strictly 
stronger than the definitions for high-entropy messages. We show that the par- 
allelizable encrypt-and-sign scheme is high-entropy confidential if the underly¬ 
ing encryption scheme is IND-CCA2 and the signature scheme is confidential 
(and deterministic). We finally prove that we can derandomize any signcryption 
scheme to derive a secure deterministic scheme. 

Besides the fact that some of our results require the signcryption scheme to 
be deterministic, we also believe that deterministic signcryption schemes may be 
intrinsically more secure than many current schemes. The reason is that most of 
the current signcryption schemes are based on discrete-logarithm-based digital 
signature schemes which are highly sensitive to imperfect randomness |18j . 

In situations where we have been forced due to size constraints to omit a 
theorem’s proof, the proof can be found in the full version of the paper m- 

2 Confidential Signature Schemes 

We formalise the notion of a confidential signature in three ways and give con¬ 
structions. These confidentiality notions can be applied to either probabilistic or 
deterministic signature schemes. 

2.1 Definition of Confidential Signature Schemes 

A digital signature scheme is a tuple of efficient algorithms SS = (SS.Setup, 
SS.Kg, SS.Sign, SS.Ver). All algorithms (in this article) are probabilistic 
polynomial-time (PPT) in the security parameter k (which we assume clear 
from the context). The parameter generation algorithm produces a set of pa¬ 
rameters common to all users A ss ^ SS.Setup(l fe ); subsequently the key gen¬ 
eration algorithm produces a public/private key pair (pk,sk) <-5- SS.Kg(A ss ). 
(Until Section 14.21 we will silently assume that A ss allows retrieval of k and 
both pk and sk allow retrieval of X ss , simplifying notation.) The signing algo¬ 
rithm takes a message m £ {0,1}* and the private key, and outputs a signature 
cr <-5- SS.Sign (sk, m). The verification algorithm takes as input a message, signa¬ 
ture and public key, and outputs either a valid symbol T or an invalid symbol 
_L. This is written SS.Ver (pk,m,a). The standard notion for signature security 


Confidential Signatures and Deterministic Signcryption 


465 


Expt^ Si 9 ~ b (k): 

A ss SS.Setup(l fe ) 

( pk , sk) A- SS.Kg(Ass) 
(m 0 , to) ^ Mi(A ss ) 

(mi, ti) A(Ass) 

<t* <— SS.Sign (sk,m,b) 
t' ^Af sisa(sk ' \pk,<T*) 
If t! = to then output 1 
Else return 0 


Ex V t r f i9 ~\k\. 

Ass M SS.Setup(l fe ) 

{pk, sk) A- SS.Kg(Ass) 
(mo, to) Mi(pfc) 

(mi, ti) M Ai{pk) 

<x* <— SS.Signfsfc, mi) 
t' Af sM \pk,<T*) 
If t' = to then output 1 
Else return 0 


Exptf 9 ~ b {k): 

Ass ^ SS.Setup(l fe ) 

( pk,sk ) ell sS.Kg(Ass) 

(mo,to)^4 S ' %( ’ k ’' ) W 

(mi,ti)e«^f sig,l(sfc '- ) (pfc) 
<t* <—SS.Sign(sfc, mt) 
t' A Af ^^' ^pk, a*) 

If t' = to then output 1 
Else return 0 


Fig. 1. Notions of confidentiality for (a) weakly confidential signature schemes; (b) 
mezzo confidential signature schemes; (c) strongly confidential signature schemes. The 
signing algorithm is applied to the message vector m component-wise. 


is that of unforgeability under chosen message attacks (see Appendix IA.1I for 
formal definitions). 

We present three confidentiality notions for a digital signature scheme — 
see Figure [U These notions are split depending on the adversary’s capabilities, 
which corresponds in a natural way to real-life scenarios where it may be possible 
to derive some information about a message from a signature which might be 
deemed practically useless, e.g., the value of the hash of the message, but leakage 
of which cannot be avoided. 

In the weak confidentiality model, the attacker should not be able to determine 
any information about the messages apart from that which can be obtained di¬ 
rectly from the signature itself. Mezzo confidentiality models the scenario where 
the attacker is able to retrieve public keys of the users, but cannot interact 
directly with their communication network and obtain signatures of messages. 
In the strong model, an active attacker should not be able to determine any 
information about the messages apart from the signature itself. 

For x £ { w , m, s}, the attacker M’s advantage in the xSig game is defined to 
be: 

Adv x f 9 {k) = |Pr [Expt x f 9 ~ 0 {k) = 1] - PviExpt^ 19 - 1 (jfc) = 1]|. 

A signature scheme is weakly confidential (resp. mezzo confidential/strongly con¬ 
fidential) if all PPT attackers A = (Mi, M 2 ) have negligible advantage Adv^ w {k) 
in the wSig (resp. mSig/sSig) security game, subject to the following restraints: 

— Pattern preserving: there exist a length function £{k) and equality functions 

o ij £ {=, 7 ^} (1 < hj < £(/c)) such that for any admissible input a in the 
corresponding game and all possible (m, t) Mi (a) we have that \m\ = £{k) 

and mi o y mj. 

— High entropy: the function 7 r(fc) = max m gjo i}, Prjm,; = m : (m,t) 

Mi (a)] is negligible, where the probability is over Mi’s random tape only 
(and * £ N and all choices of the other algorithms are fixed). The value 
p{k) = — log 2 7 r(fc) is termed the adversary’s min entropy. 
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SS.Kg^Ass): SS.Sign'(sA;||r, m): SS.Ver'(pfc||r, m, o): 

r <2. |o, l} fc lim = m'\\r If m = rri\\r 

(pk, sk) <-5 SS.Kg(A ss ) Return SS.Sign(sfc, m)||m Parse o as o-'||m 

Return {pk\\r, sfc||r) Else o <—o 

Return SS.Signify m) Return SS.Ver (pk,m,o) 

Fig. 2. A signature scheme which is weakly confidential but not mezzo confidential 

For deterministic schemes we need the following additional constraint, ruling out 
trivial attacks: 

— Signature free: *4i does not output a message irii £ m where it has queried 
the signature oracle on mj. (This security requirement only affects strongly 
confidential signature schemes.) 

The latter condition prevents an attacker against a deterministic scheme from 
“winning” by setting t <— SS.Sign (sk,m) — i.e., it prevents the attacker from 
“winning” the game simply by determining that the message to has the property 
that its unique signature is SS.Sign(sfc, to). 

The notions of confidentiality are strictly increasing in strength. If SS is a 
weakly confidential signature schemes, then Figure [2] depicts a scheme which 
is weakly confidential but not mezzo confidential. Similarly, if SS is a mezzo 
confidential signature scheme, then Figure [3] shows a scheme which is mezzo 
confidential but not strongly confidential. 

SS.Ver' [pk, m, o): 

If a = (ex', m!) 

Parse m' as m! = m!'\\r' ||cr(. 

Return T iff 

SS.Ver (pk, l||m', a') = T, and 
m = mn !, and 
SS.'Jer(pk, 0||r', oy) = T 
If <t = (cr',r',oy.) 

Return T iff 

SS. Ver(pk, 2\\m, a') = T, and 
m ^ m" ||r'||ff(. for any m" € {0,1}*, 
and SS.Ver (pk, 0||r', oy) = T 
Else return T 

Fig. 3. A signature scheme which is mezzo confidential but not strongly confidential 


SS.Kg'(A ss ): 

(pk, sk) At SS.Kg(A M ) 

r e* {0, l} fc 

o> <— SS.Sign(sA:, 0||r) 

Return (pk, sk\\r\\a r ) 

SS.Sign'(s£;||r||ay, m): 

If m = m'||r||o> 

Set a' <— SS.Sign(sfc, 1 ||to) 
Return a = (o', m) 

Else 

Set o <— SS.Sign(sfc, 2||m) 
Return o = (o', r, oy) 


3 Confidential Hash Functions and Signature Schemes 

3.1 Confidential Hash Functions 

We recap the notion of a hiding hash function by Bellare et al. 0, but call such 
functions confidential here. For our purposes, a hash function H = (H.Kg, H) is 
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Expt^ Hash - h (k): 

H A H.Kg(l fc ) 

(xo,to) e5 _4l(l fc ) 
(*!,*!) ^^( 1 *) 
h <— H(a;6) 
t' A A 2 (U,h) 

If t‘ = to then output 1 
Else return 0 


Expt s J? ash - b (ky. 

H A H.Kg(l fc ) 

(xo,to) e5 Al(H) 
^»4i(H) 
h <— H(a:&) 
t' e5 A 2 (R,h) 

If t! = to then output 1 
Else return 0 


Fig. 4. Notions of confidentiality for (a) weakly confidential hash functions; (b) 
strongly confidential hash functions. The hash function is applied to the data vector x 
component-wise. 


a PPT pair of algorithms for key generation and hashing, respectively. We will 
identify the description output by the key generation algorithm H.Kg with the 
hash function H itself. The collision-finding advantage Adv c J( l of an attacker A 
against a hash function H is defined as 


Advft\{k) = Pr 


H(ar; r) = H^'; r') 
and ( x,r ) ( x',r') 


(. x, xr, r') e5 A(H); H ^ H.Kg(l fc ) 


The hash function H is called collision-resistant if all PPT attackers A have 
negligible advantage Adv^\{k) (as a function of k). We require that the hash 
function is hiding/confidential against an attacker A = (Ai,A 2 ) playing one of 
the games in Figure 0] For x € {w, s} the attacker’s advantage is defined to be 

^H H „4 Sh 0) = |Pr[£zpPf ash -°(fc) = 1] - Pr [Expt^^ik) = 1] |. 

A hash function is weakly (resp. strongly) confidential if every PPT attacker 
A has negligible advantage in the corresponding game subject to the following 
restraints: 


— Pattern preserving: there exist a length function i{k) and equality functions 
©i j G {=,/} (1 < i,j < £(k)) such that for all possible (x,t) ^ -4i(l fe ) we 
have that \x\ = l[k) and Xi ©^ Xj. 

— High entropy: the function n(k) = max xg {o,i}* Pr[a"i = % ■ ( X A) ^ ^i(o)] 
is negligible where the probability is only over Ai’s random tape. We define 
p{k) = — log 2 7r (k) to be the adversary’s minimum entropy. 

Note that collision-resistant deterministic hash functions cannot achieve strong 
confidentiality because an adversary Ai can set t = H(a:) for some message x and 
A 2 can easily obtain this value from the hash vector h. We also note that for 
“unkeyed” hash functions both notions are equivalent and so no unkeyed, deter¬ 
ministic hash function can be considered confidential (unless the hash function 
is almost constant). 

In the random oracle model, where the adversary is granted oracle access 
to the hash function H instead of receiving the description as input, we give 
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Ai access to the random oracle in the strong case, but deny Ai access to H 
in the weak case. It is easy to see that a random oracle thus achieves weak 
confidentiality, whereas the above attack on deterministic functions still applies 
in the strong case. However, under the additional constraint that A\ does not 
query H about any x in its output x (hash-free adversaries ) a random oracle is 
also strongly confidential: 

Proposition 1 (Confidentiality of Random Oracles). For any adversary 
A = (Ali, A 2 ) where Ai outputs vectors of length f(fc) and with min-entropy 
p.(k) = — log7r(fc), and where A 2 makes at most qh(k) queries to the random 
oracle, we have 

Adv x H y h (k) < 2 • q h (k) ■ £(k) ■ n(k) 

for x £ {w, s} where A is assumed to be hash-free (in the strong case). 

As for constructions in the standard model, we note that perfectly one-way func¬ 
tions (POWs) ]7I8) provide a partial solution. POWs have been designed to hide 
all information about preimages, akin to our confidentiality notion. However, all 
known constructions of POWs are only good for fixed (sets of) input distribu¬ 
tions where the distributions can depend only on the security parameter but not 
the hash function description. Furthermore, known POWs usually require the 
conditional entropy of any Xi to be high, given the other Xj’s. In light of this, 
any ^(fc)-valued perfectly one-way function [S] is a weakly confidential hash func¬ 
tion. Hence, we can build such hash functions based, for example, on claw-free 
permutations [8] or one-way permutations 015]. 

3.2 Full-Domain Hash Signatures 

A full-domain hash (FDH) signature scheme FDH for deterministic hash function 
H is a signature scheme in which the signing algorithm computes a signature as 
a = /(H(m)) for some secret function /, and the verification algorithm checks 
that g(a) = H(to) for some public function g. More formally (assuming that 
FDH.Setup(l fc ) outputs X ss = l fc and that there exists a PPT algorithm which 
generates the functions (/, g) <— FDH.Kg'(A ss )): 

FDH.Kg(A ss ): FDH.Sign (sk,m): 

( f,g ) <— FDH.Kg'(A ss ) Parse sk as (/, H) 

H <— H.Kg(l fc ) Return a = /(H(m)) 

(pk,sk) = ((<?, H), (/, H)) 

Return (pk, sk) 

Unforgeability of FDH signatures in the ROM has been shown in |5(Sj . 

Proposition 2 (Weak Confidentiality of FDH). The FD FI-signature scheme 
FDH for hash function H is weakly confidential if H is weakly confidential. More 
precisely, for any adversary A = (Ai,A 2 ) against the weak confidentiality of 
FDH, where A\ outputs £(k) messages and A 2 makes at most q s (k) signature 


FDH.Ver(p&, m, a)\ 

Parse pk as (g, H) 

Return T if H(m) = g(cr) 
Otherwise return _!_ 
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queries, there exists an adversary B = {B\,B 2 ) against the weak confidentiality 
of the hash function such that 

Adv^ A (k) < Adv% H B ash (k), 

where B\’s running time is identical to the one of A\, and £> 2 ’s running time is 
the one of A 2 plus Time Fm x s {k) + (q s + d(k)) ■ Time Fm . sign (k) + 0(k). 

The proof actually shows that the signature scheme remains confidential for 
an adversarially chosen key pair ( f,g ), i.e., confidentiality only relies on the 
confidentiality of the hash function. Moreover, by Proposition [I] we have that 
FDH-signature schemes are weakly confidential in the random oracle model. 

Proof. Assume that FDH is not weakly confidential and that there exists an 
adversary A = (Ai,A 2 ) successfully breaking this property. Then we construct 
an adversary B = (Hi, B 2 ) against the weak confidentiality of the hash function 
as follows. Adversary B\ on input l fe runs A\ on input l fe and outputs this 
algorithm’s answer ( m,t ). 

Algorithm B 2 receives as input a description H of the confidential hash function 
and a vector h of hash values. B 2 runs (/, g) <— FDH.Kg'(l fc ), sets pk <— (g, H) and 
sk <— (/, H), and computes signatures <x* = /(h). It invokes A 2 on (l k , pk, cr*) 
and answers each subsequent signature request for message m by computing 
a = FDH.Sign(sfc, to). When A 2 outputs t' algorithm B 2 copies this output and 
stops. 

It is easy to see that H’s advantage attacking the confidentiality of the hash 
function is identical to M’s advantage attacking the confidentiality of the FDH 
signature scheme (the fact that Ai preserves pattern and produces high-entropy 
messages carries over to Hi). □ 

No (unforgeable) FDH-signature scheme is mezzo confidential, because a sig¬ 
nature on the message m leaks the value H(to). More formally, an attacker A\ 
can pick a message to A {0, l} fc and set t <— H(m). Adversary A 2 then receives 
a <— /(H(m)) and can recover t = H(m) by computing g(a). 

3.3 Strongly Confidential Signatures in the ROM 

Recall from the previous section that FDH signatures leak the hash value of a 
message. To prevent this, we make the hashing process probabilistic and compute 
(r, H(r, to)) for randomness r. Then A\ cannot predict the hash values of the 
challenge messages due to r (which becomes public only afterwards) and A 2 
cannot guess the hash values due to the entropy in the message to (even though 
r is then known). Our instantiation is shown in Figure [5l 

Proposition 3 (Random Oracle Instantiation). If H is a hash function 
modeled as a random oracle, then the signature scheme SS 7 is strongly confi¬ 
dential. That is, for any attacker A = (Ai, A- 2 ) against the strong confidentiality 
of the signature scheme SS , where A\ outputs a vector of length t(k) and with 
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Suppose SS = (SS.Setup, SS.Kg, SS.Sign, SS.Ver) is a signature scheme. We define a 
new signature scheme SS' as follows (where SS.Setup' = SS.Setup): 


SS.Kg'(A ss ): SS.Sign'(sA/, m): 

( pk,sk ) <— SS.Kg(Ass) Parse sk' as (sk, H) 

HTH.Kg(T) r^{0,l} k 

pk' <— (pk, H); sk’ <— (sk, H) h <— H(r, m) 

Return (pk', sk') a' <— SS.Sign (sk, h) 

u <— (cr',r) 

Return cr 


SS.Ver' (pk', m, cr): 

Parse pk' as (pk, H) 

Parse cr as (<r',r) 

Return SS.Ver(pfc, H(r, m),cr') 


Fig. 5. Construction of a strongly confidential signature scheme in the ROM 


min-entropy p(k) = — log 7 r(fc), and where A 2 asks at most qh oracle queries 
(signing queries and direct hash oracle queries), we have 

Adv s ^, 9 A (k) < 2 • qh{k) ■ t(k) ■ (2~ k + n(k)). 

Clearly, the scheme is also (strongly) unforgeable if the underlying signature 
scheme is (strongly) unforgeable. 

3.4 Fiat-Shamir Signature Schemes 

Our second instantiation is based upon the Fiat-Shamir paradigm ITT] that turns 
every (three-round) identification scheme into a signature scheme. An identifi¬ 
cation scheme (ID scheme) is defined by a triplet ( G,S,R ), where G is a key 
generation algorithm and the sender S wishes to prove his identity to the re¬ 
ceiver R. More formally: G(l k ) is an efficient algorithm that outputs a key 
pair (ipk,isk). (S(isk ), R(ipk)) are interactive algorithms and it is required that 
Pr [ (S(isk), R(ipk)) = 1] = 1 (where the probability is taken over the coin tosses 
of S , R and G). A canonical ID scheme is a 3-round ID scheme (a; (3\ 7 ) in which 
a is sent by the sender S, (3 by the receiver R and consists of R’s random 
coins, and 7 is sent by the sender. For a sender S with randomness r, we denote 
a = S(isk ; r) and 7 = S(isk, a, /?; r). The Fiat-Shamir signature scheme is given 
in Figure |G| 

In order to prove the confidentiality of this scheme, we need to assume that 
the commitment a of the Fiat-Shamir scheme has non-trivial entropy. This can 
always be achieved by appending public randomness. 

Proposition 4 (Fiat-Shamir Instantiation). If H is a hash function modeled 
as a random oracle, then the Fiat-Shamir instantiation SS” for non-trivial com¬ 
mitments is strongly confidential. More precisely, for any attacker A = (A\, A 2 ) 
against the strong confidentiality of the signature scheme SS where A\ out¬ 
puts a message vector of length I(k) with min-entropy p(k) = — log 7 r(fc), a has 
min-entropy p!(k) = — log 7 r '(k), and A 2 asks at most qh oracle queries (signing 
queries and direct hash oracle queries), we have 

Adv^) 9 A (k) < 2 • qh(k) ■ £(k) ■ ( 7 r(fc) + 7 r'(/c)). 
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Suppose (G, S, R) is a canonical identification scheme and H is a hash function family. 


We define the signature : 
follows (where SS.Setup(l A 

SS.Kg"(A ss ): 

( ipk, isk) <— G(\ss) 

H H.Kg(l fe ) 

pk' <— (ipk, H); sk <— (isk, 
Return (pk , sk') 


icheme SS” = (SS.Setup", 
I returns A ss = 1 A ): 

SS-Sign"^', m)\ 

Parse sk' as (isk, H) 
r^{ 0 ,l} fe 
H) a <— S(isk; r ) 
f3 <— H(a, m ) 

7 <— S(isk, a, f3; r) 

o’ <- (a, ( 3 , 7) 

Return a 


I.Kg", SS.Sign”, SS.Ver”) as 


SS.Ver "(pk 1 , m, <r): 

Parse pk' as (ipk, H) 

Parse a as (a, f3, 7 ) 

(3' <— H(a,m) 

Return 1 iff f3 = f3' 

and R(ipk, a, (3, 7 ) = 1 


Fig. 6. The Fiat-Shamir paradigm that turns every ID scheme into a signature scheme 


3.5 Strongly Confidential Signatures from Randomness Extraction 

Our instantiation in the standard model relies on randomness extractors pren] 
and is depicted in Figure [ 7 ] The main idea is to smooth the distribution of the 
message via an extractor, and to sign the almost uniform value h. 

Recall that a strong (a, b , n, t, e)-extractor is an efficient algorithm Ext : 
{0,1}“ x {0, l} b —> {0,1}” which takes some random input m £ {0,1}° (sampled 
according to some distribution with min-entropy at least t) and some random¬ 
ness r £ { 0 ,l} b . It outputs h <— Ext(m,r) such that the statistical distance 
between (r, h) and (r, u ) is at most e for uniform random values r £ {0, l} b and 
u £ {0,1}”. 

To ensure unforgeability we need to augment the extractor’s extraction prop¬ 
erty by collision-resistance, imposing the requirement that the extractors be 
keyed and introducing dependency of the extractor’s parameters a, b, n, t, e on 
the security parameter k. For a survey about very efficient constructions of such 
collision-resistant extractors see la¬ 
in order to use extractors, we need a stronger assumption on the message 
distribution: we assume that the adversary Ai now outputs vectors of messages 
such that each message in the vector has min-entropy greater than some fixed 
bound f-i(k) given the other messages. Observe that the collision-resistance re¬ 
quirement on the extractor implies that p, must be super-logarithmic. We say 
that the output has conditional min-entropy p(k). 

Proposition 5 (Extractor Instantiation). If Ext is an (a,b,n,t, e)-extractor 
then the extractor instantiation of SS"' is strongly confidential. More specifically, 
for any attacker A = (Ai,A 2) against the strong confidentiality of the signature 
scheme SS , where A\ outputs a vector of length £(k) with conditional min- 
entropy p(k ) > t(k), we have 

Adv gy® ^ (k) < 2 • £(k) ■ e(k). 

Note that our construction of the randomness extractor operates on messages 
of a fixed length of a(k) input bits, and the signature length depends on this 
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Suppose SS = (SS.Setup, SS.Kg, SS.Sign, SS.Ver) is a signature scheme. We define a 
new signature scheme SS”' as follows (where SS.Setup'" = SS.Setup): 


SS.Kg'" (A ss ): 

(pk, sk) *— SS.Kg(Ass) 
Choose an extractor Ext 
pk' <— (pk, Ext) 
sk' *— (sk, Ext) 

Return (pk', sk') 


SS.Sign" , (sA: , , m): 
Parse sk' as (sk, Ext) 
r {0, l} 6 
h <— Ext(m, r ) 
a' <— SS.Sign(sA;, h) 
a <— (a, r) 

Return a 


SS.Ver '"(pk', m, a): 

Parse pk' as (pk, Ext) 
Parse a as (r, a') 

Set h <— Ext(m, r) 
Return SS.Ver (pk, h, a') 


Fig. 7. Construction of strongly confidential signature scheme based on randomness 
extractors 


value a(k). To process larger messages we can first hash input messages with 
a collision-resistant hash function, before passing it to the extractor. In this 
case, some care must be taken to determine a correct bound for the entropy lost 
through the hash function computation. 

4 Deterministic Signcryption 

Signcryption is a public-key primitive which aims to simultaneously provide mes¬ 
sage confidentiality and message integrity. Signcryption was introduced by Zheng 
[21] and security models were independently introduced by An, Dodis and Ra¬ 
bin [I] and by Baek, Steinfeld and Zheng [ 2 ]. Similar to public-key encryption, 
achieving confidentiality in the formal security models requires that signcryp¬ 
tion is a randomised process; however, we may also consider the confidentiality 
of deterministic signcryption schemes on high-entropy message spaces. We will 
also see that a practical version of confidentiality may even be achieved by a 
deterministic signcryption scheme for low entropy message distributions. 

4.1 Notions of Confidentiality for Signcryption Schemes 

A signcryption scheme is a tuple of PPT algorithms SC = (SC.Setup, SC.Kg s , 
SC.Kg r , SC.SignCrypt, SC.UnSignCrypt). The setup algorithm generates public 
parameters A sc <- 5 - SC.Setup(l fe ) common to all algorithms. We will generally 
assume that all algorithms take A sc as an implicit input, even if it is not ex¬ 
plicitly stated. The sender key-generation algorithm generates a key pair for 
the sender (pk s ,sks) SC.Kg s (A sc ) and the receiver key-generation algorithm 
generates a key pair for a receiver ( pk R ,sk R ) SC.Kg r (A sc ). The signcryp¬ 

tion algorithm takes as input a message to £ Ai, the sender’s private key 
sks, and the receiver’s public key pk R , and outputs a signcryption ciphertext 
C 41 SC.SignCrypt(sfcs,pfc fl , m). The unsigncryption algorithm takes as input 
a ciphertext C £ C, the sender’s public key pk s , and the receiver’s private key 
sk R , and outputs either a message m SC.UnSignCrypt(pfc s , sk R , C) or an 
error symbol _L. 
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It is interesting to consider the basic attack on a deterministic signcryption 
scheme. In such an attack, the attacker picks two messages (mo, mi) and receives 
a signcryption C* of the message mj,. The attacker checks whether C* is the 
signcryption of mo by requesting the signcryption of mo from the signcryption 
oracle. As in the case of public-key encryption, we may prevent this basic attack 
by using a high-entropy message space and so prevent the attacker being able to 
determine which message to query to the signcryption oracle. However, unlike the 
case of public-key encryption, we may also prevent this attacker by forbidding 
the attacker to query the signcryption oracle on mo and mi. We can therefore 
differentiate between the high-entropy case (in which the message distribution 
chosen by the attacker has high entropy) and the low-entropy case (in which 
the attacker is forbidden from querying the signcryption oracle on a challenge 
message). 

We give definitions for the high-entropy and low-entropy confidentiality in 
Figure [HI In both cases, i.e. for x £ {h, l}, the attacker’s advantage is defined as 

Advl^{k) = I Pr [Exptf 011 - 1 = 1 ] - Pr [Expt x J CR -° = 1 ]|. 

A signcryption scheme is high-entropy confidential if every PPT attacker A has 
negligible advantage in the hSCR game subject to the following restrictions: 

— Strongly pattern preserving: there exists a length function I{k), message 

length functions ft(fc), and equality functions Oy £ {=,/} (1 < i,j < i'(fc)) 
such that for all possible ( m,t ) Ai(X sc , pk* s , pk* R ) we have that \m\ = 

£(k ), |m^| = qi{k ) and m.jOq mj. 

— High entropy: the function tt (k) = max me j 0 i}. Pr [mi = m : (m, t ) Ai(a)] 

is negligible where the probability is only over Ai’s random tape. The value 
/x(fc) = — log 7r(fc) is known as the adversary’s minimum entropy. 

— Signature free: A\ does not output a message wij £ m where it has queried 
the signcryption oracle on the pair ( pk * R , mf). 

— Non-trivial: A2 does not query the unsigncryption oracle on any pair ( pk* s , C) 
where C £ C*. 

A signcryption scheme is low-entropy confidential if every PPT attacker A has 
negligible advantage in the 1 SCR game subject to the restrictions that A never 
queries the encryption oracle on either (pk* Rl mo) or ( pk* R ,m\ ), and A2 never 
queries the decryption oracle on ( pk* s , C*). 

Proposition 6. Any deterministic signcryption scheme SC which is low-entropy 
confidential is also high-entropy confidential. In particular, for any adversary A 
against high-entropy confidentiality, making at most q s (k ) signcryption queries 
and where Ai outputs I(k) messages with min-entropy p,{k) = — log7r(fc), there 
exists an adversary A such that 

Adv^(k) < 6 (k) ■ Adv l s s c c %(k) +4 ■ q s (k) ■ I{k) ■ n{k ), 
where the running time of A equals the time of A plus 0 (k). 
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Expt h J GR ~ b (fc): 

X ac A SC.Setup(l fc ) 

(pk* s ,sk* s ) A SC.Kg s (A sc ) 

(. pk* R ,sk* R ) A SC.Kg r (A sc ) 

(m 0 , to) -4f ( X S c,pk* s ,pk * R ) 

(mi, ti) _4f ( Xsc,pk* s ,pk * R ) 

C* <— SC.SignCrypt(A sc , sk* s , pk* R , mb) 
t' A$(X sc ,pk* s ,pk* R ,C*) 

If t' = to then output 1 
Else return 0 


Expt l Z CR - b (k): 

Xsc f- 2 SC.Setup(l fc ) 

( pk*s,sk* s ) SC.Kgs(Asc) 

( pk* R ,sk* R ) sc.Kg r (A sc ) 

(mo, mi, w) .4? (A sc , pk* s ,pk R ) 

C* <— SC.SignCrypt(Asc, sfcg, pfcjj, mb) 
b' A%{C*,u) 

Output b' 


Fig. 8 . Notions of confidentiality for (a) high-entropy signcryption schemes and (b) 
low-entropy signcryption schemes. Note that _4i may pass the state information 
uo to A 2 in the 1SCR game. The attacker’s have access to a signcryption oracle 
SC.SignCrypt(sfcg, •, •) and an unsigncryption oracle SC.UnSignCrypt(-, sk* R , •). 


The proof essentially shows that, since the challenge messages produced by a 
high-entropy attacker Ai have min-entropy /r(fc), the probability that A 2 queries 
the signcryption oracle on one of those messages is bounded by 4 :-q s (k)-i(k)-n(k). 
If this does not occur, then a low-entropy attacker can easily run a high-entropy 
attacker as a black-box subroutine. The proof holds for deterministic schemes 
only. We are not aware if the same is true for probabilistic schemes. 

We also have that the low-entropy confidentiality definition is strictly stronger 
than the high-entropy confidentiality definition. If SC is a high-entropy confiden¬ 
tial signcryption scheme, then the signcryption scheme SC given in Figure [H] is 
high-entropy confidential signcryption scheme but not a low-entropy confidential 
signcryption scheme. 


SC.SignCrypt'(sfcs, pk R , m): 

C <— SC.SignCrypt (sks,pk R ,m) 
If m = 

Return C|jO 
Else 

Return CHI 


SC.UnSignCrypt^pfcg, sk R , C)\ 

Parse C as C"||c for c £ {0,1} 
m <— SC.UnSignCrypt(pA: i 5 , sk R , C') 
If c = 0 and ra/O 4 
Return T 

If c = 1 and m = 0 k 
Return T 
Else 

Return m 


Fig. 9. A signcryption scheme which is high-entropy secure but not low-entropy secure 


4.2 The Encrypt-and-Sign Signcryption Scheme 

Initially, it may be thought that high-entropy confidentiality may be easily 
achieved through the combination of deterministic encryption and confidential 
signatures. However, many of the classic composition theorems, such as encrypt- 
then-sign, fail to achieve high-entropy security even when instantiated with se¬ 
cure components. 
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SC.Setup(l' c ) 

A ss <- SS.Setup(l fc ) 

A P ke <- PKE.Setup(l* ! ) 

A S c 4 (Ass, Apfce) 
Return (A sc ) 


SC.SignCrypt(A sc , pk R , sks, m) 
Parse A sc as (A ss , A p/ t e ) 
c <— PKE.Enc(A P ke,pk R , (pfc s ||m)) 
a <— SS.Sign(Ass, sks, (pk R \\m)) 
Return C = (c, a) 


SC.K gr (As C ) 

Parse A sc as (A ss ,A p fc e ) 
{pk R , sk R ) <- PKE.Kg(Apte) 
Return ( pk R , sku) 

SC.Kg s (Asc) 

Parse A sc as (A ss ,A p fc e ) 
(pk s , sks) <- SS.Kg(Ass) 
Return ( pk s ,sks ) 


SC.UnSignCrypt(Asc, sfcft, pk s ,C) 

Parse A sc as (\ss,\ P ke) 

Parse C as (c, cr) 

(pfeg||m') <— PKE.Dec(A p fc e , sku, c) 

If pfcg ^ pfc s , reject 
Extract pfcjj from ska 

If SS.Ver(A S s,pk s , {pk R \\in'),u) = _L, reject 
Return m! 


Fig. 10. The Encrypt-and-Sign signcryption scheme 


However, we can show that the encrypt-and-sign (which is typically inse¬ 
cure as a signcryption scheme) is secure when instantiated with an IND-CCA 2 
public-key encryption scheme and a strongly confidential signature scheme^]. The 
construction is given in Figure HTT 1 The scheme can easily be shown to be unforge- 
able (in the sense that an attacker cannot obtain a signcryption of any message 
which was not previously sent by that sender to that receiver). 

Theorem 1 . If the signature scheme is deterministic, strongly unforgeable, and 
strongly confidential, and the encryption scheme is IND-CCA 2 secure, then the 
signcryption scheme is confidential in the high-entropy model. In particular, if 
there exists an attacker A against the high-entropy security of the signcryption 
scheme (asking I(k) challenge messages and making at most q sc (k) signcryption 
queries), then there exist attackers A p ke, A ss , and A sun f against the IND-CCA2 
security of the encryption scheme, against the strong confidentiality of the signa¬ 
ture scheme, and against the strong unforgeability of the signature scheme, such 
that 


Adv^(fc) < £(k) ■ Adv c p c K a E 1 2 * ^ e (fc) + Adv s s s s is A Jk) + Advi^- C “ a (fc) . 
where the running times of A v ke, A ss , and A SU nf equal the one of A plus ( q sc (k)+ 

^(k)) ’ ( 'Az'/UCsc.SignCrypt [l' ) T Tfmesc.UnSignCrypt (^) ) T 0{k). 

The security of this scheme can be proven in a manner similar to the encryp¬ 
tion/signature composition theorems proven by An et al. pQ. 


1 Strongly confidential, probabilistic signature schemes are given in Sections 13.31 

and !3.4l These can be transformed in a strongly confidential, deterministic signature 

schemes using the derandomization techniques discussed in the next section. 
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4.3 Derandomization 

Goldreich CZI presents a technique to turn any probabilistic signature scheme 
into a deterministic one. The idea is to include the secret key n of a pseudoran¬ 
dom function (PRF.Kg, PRF) in the secret signing key and, when signing a message 
to, use the random coins r = PRF(k; to) in this process. Note that the resulting 
scheme now yields the same signature if run twice on the same message. A formal 
definition of a PRF can be found in Appendix lAl 

We show that Goldreich’s idea applies to signcryption schemes as well, taking 
advantage of the fact that a signcryption scheme involves a secret signing key in 
which we can put the key k of the pseudorandom function. Nonetheless, whereas 
a probabilistic signcryption scheme usually hides the fact that the same message 
has been encrypted twice, a derandomized version clearly leaks this information. 

For a signcryption scheme SC the derandomized version SC PRF based on a 
pseudorandom function PRF works according to Goldreich’s strategy: 


SC.Setup PRF (l fc ): 

Return A sc «— SC.Setup(l fe ) 

SC.Kg s PRF (A sc ): 

( sk s ,pk s ) <- SC.Kg s (A sc ) 
k <- PRF.Kg(l fe ) 

sk o RF <— (sks, k); pk o RF <— pk a 
Return (sk p s RF ,pk p s RF ) 

SC.Kg r PRF (A sc ): 

Return ( sk R ,pk R ) <— SC.Kg r 


SC.SignCrypt PRF (sA; PRF , pk R , m): 

Parse sk RRF as ( skg,n ) 
r <— PRF(k, ( pk R , m)) 

C <— SC.SignCrypt(sA;s, pk R , m; r) 
(i.e. using randomness r) 

Return C 

SC.UnSignCrypt PRF (sA;/j, pfc PRF , C ): 
Return SC.UnSignCrypt(s£;#,p/;5, C) 


Proposition 7 (Derandomized Signcryption). Let SC be an unforgeable 
and high-entropy (resp. low-entropy) confidential signcryption scheme. Then the 
scheme SC PRF is a deterministic, unforgeable signcryption scheme which is high- 
entropy (resp. low-entropy) confidential. That is, for x £ {l,h} and any adver¬ 
sary A = (Ai, A 2 ) against xSCR confidentiality, there exist adversaries V and 
B = (61,62) such that 

Adv scS.F^(fc) < 2 • Adv^ RF {k) + Adv^ c e R (/fc) + 2 q sc {k) ■ £(k) ■ n(k) 

where V’s running time is identical to the time of A, plus Tfmesc.setup(fc) + 
Time S c.K es (k) + Timesc.K gl (k) + (q sc +£(k))- Tjme S c.signCrypt(fc) + 0(fc); the running 
time of 6 equals the time of A plus 0(q sc ■ log q sc ). 
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A Standard Security Notions 


A.l Signature Schemes 

The standard notion for signature security is that of (strong) existential unforge¬ 
ability under chosen message attacks (sEUF-CMA). The strong version is defined 
below. Freshness of (to, a) indicates that a was never received by A as response 
to a signing request on m. 


Adv 


seuf — cma 
SS ,A 


(k) = Pr 


SS.Ver(A ss ,pk,m,a) = T 
(to, a) is fresh 


\ ss ^ SS.Setup(l fc ) 

( pk,sk ) SS.Kg(Ass) 

(to, a) A A ss sisa( - x “’ sk, '\X p ke,pk) 


The advantage Adv|5 f ^ cma (fc) of the slightly weaker notion (EUF-CMA) is defined 
analogously, but this time m only needs to be fresh. 


A.2 Public-Key Encryption 

A public key encryption scheme is a tuple of algorithms PKE = (PKE.Setup, 
PKE.Kg, PKE.Enc, PKE.Dec). First the common parameters for the given security 
level k £ N are generated by A p k e PKE.Setup(l fe ) after which a user’s pub¬ 
lic/private keys are generated using ( pk,sk ) ^ PKE.Kg(A p fc e )- Given such a key 
pair, a message m £ {0,1}* is encrypted by PKE.Enc(A p fc e i pk, to); a cipher- 
text is decrypted by to PKE.Dec(A p fc e , sfc, c). For consistency, we require that 
for all messages to £ {0,1}*, we have that PKE.Dec(sfc, PKE.Enc(pfc, to)) = to. 
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We require a PKE is secure against IND-CCA2 attacks |21ll3j . for which the 
advantage of an adversary A = {Ai,A 2 ) is defined as 


Ad = |Pr [Expt 
where (for b £ {0,1}): 


l cca2 — 0 
'A 


l] — Pr [ Expt A a 1 = l] 


cca2 — b 


Expt c ^~ b 

Xpke ^ PKE.Setup(l fe ) 
(pk,sk) ^ PKE.Kg(Apfce) 



c* A- PKE.Enc(\ p ke, pk, mb) 

V A 

Output 1 if b' = b 


The adversary A 2 is may not query PKE.Dec(sfc, •) with c*. A PKE scheme 
PKE is IND-CCA2 secure if the advantage function Advp){|^(/c) is a negligible 
function for all probabilistic polynomial-time adversaries A = (Ai, A2). 

A.3 Pseudo-Random Functions 

A pseudo-random function is a pair of algorithms PRF = (PRF.Kg, PRF). The key 
generation algorithm outputs a key k PRF.Kg(l fe ). For our purposes, a pseudo¬ 
random function PRF(k, •) takes arbitrary bitstrings as inputs and outputs a 
bitstring in a given space 1Z. Let T be the set of all functions from / : {0,1}* —> 
7Z. The security of a PRF against a PPT attacker A is defined by the following 
two games: 


Expt P A RF ~°(k ): 

K A- PRF.Kg(l fe ) 
Return A PRF ( K ’’)(l fc ) 


PRF—0 


Expt A 


PRF— 1 


(k): 


Return _4.^d (\ k ) 


The attacker’s advantage is defined to be: 


Adv p R R F F .. 4 (fc) = |Pr [Expt PRF ~ 0 {k) = 1] - Pr [Expt p J 


PRF— 0 


PRF— 1 


(*) = 1 ]|- 
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Abstract. We propose new identity-based multi-signature (IBMS) and 
aggregate signature (IBAS) schemes, secure under RSA assumption. Our 
schemes reduce round complexity of previous RSA-based IBMS scheme 
of Bellare and Neven |BN07| from three to two rounds. Surprisingly, this 
improvement comes at virtually no cost, as the computational efficiency 
and exact security of the new scheme are almost identical to those of 
IBN07) . The new scheme is enabled by a technical tool of independent 
interest, a class of zero-knowledge proofs of knowledge of preimages of 
one-way functions which is straight-line simulatable, enabling concur¬ 
rency and good exact security, and aggregatable, enabling aggregation of 
parallel instances of such proofs into short multi/aggregate signatures. 


1 Introduction 

A multisignature protocol allows a group of players to sign the same message by 
generating a short string, called a multisignature, which can be verified against 
the set of the public keys of these players. Aggregate signature is a generaliza¬ 
tion of this notion to the case where each player signs a potentially different 
message. Such schemes reduce the bandwidth needed to transmit signatures, the 
space needed to store them, and the time needed to verify them, from linear 
in the number of the cosigners to a constant. Reducing bandwidth is especially 
important for low-energy devices, such as RFID chips and sensors, which com¬ 
municate over energy-consuming wireless channels where data transmision con¬ 
sumes several orders of magnitude more energy than arithmetic operations (see 
e.g. [IBA031 1. Standard multi-/aggregate signatures reduce the space taken by 
n signatures from 0(n ) to 0(1), but the verifiers still need the public keys of 
n signers. Therefore in applications where bandwidth is a bottleneck it can be 
useful to consider identity-based, multi-/aggregate signatures where verifiers only 
need unique identifiers of signers, e.g. 32-bit IP addresses, instead of public keys. 

Identity-Based (Multi-/Aggregate) Signatures. Identity-based cryptog¬ 
raphy |Sha84j simplifies public key management by replacing users’ public keys 
with their identity e.g. their names, e-mails or IP addresses. In identity-based 
scheme a trusted party, a Private Key Generator (PKG), generates a private key 
corresponding to each user’s identity, and messages signed using such keys can 
be then verified using the signer’s identity and the PKG’s master public key. 


P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 4801-498, 1 2010. 
(c) International Association for Cryptologic Research 2010 - 
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In the case of identity-based multi-/aggregate signatures, if all signers have their 
private keys issued by the same PKG then the verifier needs only the PKG’s 
master public key and the identities of all signers. Note that in many applica¬ 
tions the identities of signers are often present in the protocol messages, e.g. the 
usernames or IP addresses in packet headers, in which case an identity-based 
multi-/aggregate signature adds only a constant bandwidth overhead over un¬ 
authenticated messages. 

Current State of the Art. Standard signatures imply identity-based signa¬ 
tures following the “certification paradigm”, e.g. [GHK06] . i.e. by simply at¬ 
taching signer’s public key and certificate to each signature. However, it is not 
clear how to apply this idea to convert standard multi-/aggregate signatures, 
e.g. IBIMnmlBHTOSl . into identity-based ones, because it is not clear how to ag¬ 
gregate n separate public keys and certificates, even if all certificates are signed 
by the same CA. (Standard aggregate signatures can be used to eliminate the 
overhead of CA’s signatures on the certificates, but this would not eliminate the 
overhead due to the public keys.) 

The first efficient IBAS/IBMS schemes designed from scratch are due to 
Gentry and Ramzan |GR06[ . Their schemes employ a group with a bilinear 
map, their security relies, in the Random Oracle Model (ROM), on the hardness 
of GapDH problem, the schemes are non-interactive, and both the signing and 
verification times take 0(1) exponentiations and bilinear map operations. How¬ 
ever, the IBAS scheme of IGROfil requires all cosigners to share a common token 
for every set of signatures they want to aggregate, and each cosigner must ensure 
that this token has not been previously used in signing a different message, hence 
in some applications this scheme will need an extra communication round for the 
participants to agree on a fresh common token. In subsequent work, Boldyreva 
et. al [BGOYlO j (correcting a previous version of this paper) proposed an IBAS 
scheme which does not need these unique tokens but it requires sequential com¬ 
munication pattern, and it is based on a more complex bilinear map assumption. 
Note that while sequential communication is perfectly suited to some applica¬ 
tions, e.g. secure route discovery [KT05j . it introduces unnecessary overhead for 
players connected e.g. by a broadcast channel or a tree topology. 

Without bilinear maps, Bellare and Neven [BN07) gave an IBMS scheme which 
relies on the RSA assumption in ROM. Their scheme also has fast multi-signature 
generation and verification, requiring 0(1) exponentiations, but it takes three 
rounds of interaction. Note that any 3-round IBMS implies a 4-round IBAS 
if all cosigners’ messages are broadcast and the IBMS scheme is run on their 
concatenation. (Moreover, in the IBMS scheme of |BN07| this broadcast can 
be piggybacked on the first protocol round, giving a 3-round IBAS scheme.) 
However, such broadcast of all messages to all co-signers imposes bandwidth 
usage which might not be otherwise required, and so apart from this generic 
transformation it is interesting to consider IBAS schemes which do not require 
such broadcast. (As a side remark, we believe that the 3-round IBMS scheme of 
IBM)7| can be modified to a 3-round IBAS scheme without such broadcast, e.g. 
using ideas similar to our IBAS scheme (BJlOj .) 
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Fig. 1. (1) All schemes have been given security proofs only in the ROM model; (2) The 
IBAS scheme of IGR06I assumes that the players share a unique and common token 
for every instance of the IBAS scheme. This requirement can be avoided at the cost of 
an additional round of interaction, while the scheme of IBGOYlOl requires sequential 
aggregation; (3) Signing time is measured per player. In both signing and verification 
costs, P is the cost of one pairing operation, M is the cost of scalar multiplication on 
an elliptic curve, and E is the cost of (multi-)exponentiation in Z* (with about 80-bit 
exponents); (4) Signature length is measured in bits where k is the security parameter, 
n is an RSA modulus, l is an upper bound on the number of players, Gi and G 2 are 
two groups of elliptic curve points with an asymmetric bilinear map, G is a group of 
elliptic curve points with a symmetric bilinear map, and \A\ stands for the bitsize of 
representation of elements in group A. Typical values for these parameters are k = 160, 
|Gi| = 160, \G\ = 512, log l = 20, and |Z*| = 1024 or 2048. 


Our Contributions. We propose IBMS and IBAS schemes secure under RSA 
assumption in ROM which require only two rounds of communication. This 
provides alternatives to IBMS/IBAS schemes based on bilinear maps especially 
in applications which intrinsically take two communication rounds, such as au¬ 
thenticated route discovery or aggregation of broadcast acknowledgements. Since 
bilinear map operations are still more expensive than RSA exponentiation, our 
computational costs are slightly lower in signing and significantly lower in verifi¬ 
cation, compared to e.g. |GR06| . although our signatures are longer. A summary 
of these comparisons is in Figure |T| 

Further Related Work. Gregory Neven introduced two primitives, sequen¬ 
tial aggregate signed data and multi-signed data , corresponding to aggregate sig¬ 
natures and multisignatures respectively, whose goal is to minimize the total 
bandwidth consumed by signatures and messages incurred in transmission of au¬ 
thenticated data originated by multiple sources [Nev08] . His constructions use 
message recovery techniques to squeeze message bits into a (multi/aggregate) 
signature. Comparing his work to ours, we note that (1) his schemes support 
only sequential aggregation when signing different messages; (2) bandwidth sav¬ 
ings depend on message sizes (for small messages the bandwidth can be worse 
than with standard signatures); (3) these schemes do not address the overhead 
due to public keys, which raises an interesting question whether total band¬ 
width due to signatures and messages can be further reduced, perhaps using 
message-recovery techniques, with identity-based multi/aggregate signatures. In 




























Identity-Based Aggregate and Multi-Signature Schemes Based on RSA 483 


other related work Herranz and Galindo et. al [Her06 1 GHK06) show identity- 
based signatures which can be aggregated if they originate from the same signer. 

Organization/Roadmap. Section [2] contains a technical overview of our con¬ 
structions. In Section [3] we define IBMS schemes. (We relegate a formal de¬ 
scription of IB AS schemes to |BJ10| .) In Section |4] we develop our tools, namely 
we introduce structured-instance zero-knowledge (ZK) proofs and A-equivocable 
commitments and we show that A-equivocable commitments suffice to compile a 
class of H-protocols which includes an RSA-based identification protocol, a proof 
of knowledge of e-th root, to straight-line simulatable structured-instance ZK. In 
Section|5]we show homomorphic H-equivocable commitments secure under RSA. 
By the results from Section 0] this implies an aggragatable structured-instance ZK 
proof of knowledge of e-th root, which leads to an IBMS scheme construction, 
described in Section [Gl and an IBAS scheme sketched in Section 0 

2 Technical Overview 

Our IBAS/IBMS scheme is a multi-prover version of Guillou-Quisquater signa¬ 
ture |GQ88| . The ID-based version of GQ signature is a non-interactive zero- 
knowledge (NIZK) proof of knowledge (PK) of e-th root modulo n (in ROM). 
Let y = H(ID) be an element in Z* and let x be the e-th root of y, a private 
key corresponding to identity ID. (Such private key can be computed the PKG 
who knows the factorization of n.) To sign message m, the signer with iden¬ 
tity ID follows the ROM-based NIZK PK of e-th root of y: It computes the 
first proof message a = k e for random k in Z*, gets challenge c by querying 
(to, a) to a hash function (modeled as random oracle), and computes response 
2 to this challenge as z = kx c . The signature is (a, z) verified by checking if 
z e = ay c for c = H(m , a). Due to homomorphic property of exponentiation one 
might hope to obtain an IBAS/IBMS scheme by aggregating such ROM-based 
NIZK PK’s of e-th root made by several cosigners. For instance, consider the 
two-round protocol built along the lines of the DL-based multisignature scheme 
of jMOROlj : In the first round each player broadcasts its first message Oj. All 
players obtain a common challenge c by querying the hash function on input in¬ 
cluding a = Oj and the message being signed. Finally each player broadcasts 
its response Zi to this challenge. The multi-signature is (a, z) where z = Y[ z i- 
Note that if z\ = a^y/ for each i then (a, z) satisfies the verification equation 
z e = a ( n y,:) c where y* = H(IDi). We believe that an adaptation of the security 
proof of [MOROl] would show security of this scheme, but the resulting security 
argument would have several limitations: (1) The reduction would be only from 
expected-time hardness of RSA problem; (2) It would encounter substantial se¬ 
curity degradation due to extensive use of rewindings; (3) It would therefore not 
extend to concurrent executions of multiple instances of this scheme. 

To explain how we overcome these limitations we need to first explain why 
they appear in the above draft scheme. The simulator for the NIZK PK of e-th 
root picks a random challenge c and a random z in Z*, computes prover’s first 
message as a = z e y~ c and defines the hash of (a, to) as c because it controls 
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the hash function H. Note that since the adversary has no information about a , 
there is only a negligible chance that it queries H on the same (a, m) before the 
simulator attempts to define its value as c, and hence the simulator passes with 
overwhelming probability. The fundamental difference between this simulation 
and the simulation for aggregated proof in the draft scheme above is that in 
the aggregated proof corrupt cosigners can choose their contributions cq on the 
basis of Oj’s broadcasted by the honest cosigners. Consequently, the simulator 
can only guess the resulting a value with probability 1/qh where qh is the number 
of hash queries the adversary makes. This gives rise to a simulation procedure 
which rewinds the adversary expected qh times in each signature instance, which 
causes all the limitations listed above: reduction to expected-time hardness, loose 
security reduction, and no argument for security of concurrent protocol instances. 

Bellare and Neven |BN07] showed how to overcome all these issues in the 
ROM model by adding an extra communication round in which each player 
first commits to its cq contribution by broadcasting a hash By control¬ 

ling the hash function H the simulator can learn the cq’s committed by the 
adversary and then decide on the a,;’s published on behalf of the honest players. 
This way the simulator passes without rewinding with overwhelming probability, 
similarly to the NIZK simulation sketched above. The main technical challenge 
we handle in this work is how to achieve such straight-line simulation without 
introducing such extra communication round, i.e. with only two rounds of in¬ 
teraction. Our technique is a variant of Damgard’s HVZK-to-ZK compilation 
[DamOO] which constructs a straight-line simulatable zero-knowledge proof from 
any 17-protocol using an equivocable commitment scheme, but we introduce an 
interesting twist: In Damgard’s scheme a signer commits to its cii value using an 
equivocable commitment scheme, and the simulator, on any challenge c can open 
this commitments to the value cq = zfy~ c needed for the proof to verify (where 
response Zj is chosen at random, to match the response distribution in the real 
proof). However, to create an IBMS/IBAS scheme by aggregating such proofs 
we need this commitment scheme to be multiplicatively homomorphic, and to 
the best of our knowledge no efficient commitment scheme is both equivoca¬ 
ble and multiplicatively homomorphic. Instead, we show a commitment scheme 
which is multiplicatively homomorphic over Z* and satisfies a restricted form 
of equivocability which we call S-equivocability , and which suffices for straight- 
line simulation of 17-protocol compiled as above. For example, I7-equivocable 
commitment for relation R = {(x, y) \ y = x e } allows for equivocation of com¬ 
mitments to messages of the form z e y~ c for any c and z, and this is exactly the 
form of message a which the simulator needs in the above proof. 

The idea to use commitments with similarly restricted equivocability appeared 
before in |BCJ08j , where it was used to construct a straight-line simulatable and 
aggregatable proof of DL knowledge, and a DL-based multi-signature scheme. 
However, the equivocability notion (and the construction) of |BCJ08| gives rise 
to only single-instance zero-knowledge proofs. Intuitively, this suffices for secu¬ 
rity of multi-signatures (as opposed to identity-based multi-signatures) because 
in multi-signatures the adversary w.l.o.g. corrupts all players except of one, so the 
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simulator needs to embed its challenge problem in just one public key, and needs 
to simulate multi-signature protocol on behalf of only that one player. Using 
this form of equivocation in security argument for identity-based schemes would 
introduce security degradation by factor of qn 7 the number of hash function 
queries, because the simulator would have to guess the single identity into which 
to embed its challenge. Here we define a more general notion of U-equivocability 
which allows for straight-line simulatable “structured instance” zero-knowledge 
proofs in the CRS model: In structured-instance zero-knowledge proofs, formal¬ 
ized in this paper, the simulator can simulate on any statement in a class of 
related instances, in contrast to a single statement in single-instance ZK and 
any instance in (standard) multi-instance ZK. The class of instances which is 
particularly useful in showing a security reduction for an IBMS/IBAS scheme 
based on U-protocol for proving knowledge of preimage of function f(x) = x e 
are instances of the form y = yf(S) where y is the simulator’s challenge. In this 
way the simulator can embed its challenge into any number of identities, picking 
random S for each identity, and yet straight-line simulate the proofs performed 
on behalf of all these entities in parallel. Thus our main technical contribution 
is two-fold: First, we formalize the notion of U-equivocability and apply it to 
a compilation from A-protocols to straight-line simulatable structured-instance 
ZKPK (Section HJ. Secondly, we construct a multiplicatively homomorphic and 
A-equivocable commitment scheme based on the RSA problem (Section [5]) . To¬ 
gether, these two parts immediately imply the IBMS and IBAS schemes we 
present in this paper (Section El and Section [7]). 

3 Identity-Based Multi-/Aggregate Signature Schemes 

We define the notion of identity-based multisignature scheme (IBMS) building on 
the definitions given by [MOR011IBN06 [ GR06 BN07 . (Due to lack of space, we rel¬ 
egate the extension of our definitions to IBAS schemes to the full version of the pa¬ 
per [B.T10] ). Our notion is more flexible than that of |BN07llMOR01llBN Q6] because 
we do not require the set of participants’ identities as input to the multi-/aggregate 
signature protocol. The participating players must be aware of each other in the 
protocol execution, but this is needed only to ensure proper communication, and 
the participant identities are not required as inputs to the cryptographic proto¬ 
col. The schemes secure in this setting provide flexibility to applications of multi- 
/aggregate signatures because sometimes signers might care only about the mes¬ 
sage they are signing and not about the identities of the cosigners. Otherwise the 
list of cosigners can always be attached to the message being signed. 

Syntax of an IBMS Scheme. We define an identity-based multisignature as 
IBMS = (Setup, KeyDer, MSign, Vrfy) where Setup, KeyDer and Vrfy are proba¬ 
bilistic poly-time algorithms, and MSign is a distributed protocol executed by a 
set of parties s.t. 

- ( mpk,msk ) <— Setup(l K ), run by a trusted party, on input the security 
parameter re, generates master public key mpk and corresponding master 
secret key msk. 
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- skid <— KeyDer(TOsfc, Id), run by a trusted party, on input master secret key 
msk and an identity Id £{0,1}* provides a secret key skid to the user with 
identity Id. 

MSign is a multisignature protocol run by a group of players who intend 
to sign the same message to. Player with identity Id executes this protocol 
on public inputs mpk and message m and private input skid which is his 
own secret key. The local output of the protocol for every participant is a 
multisignature denoted a. 

{0,1} <— Vrfy (mpk, m, IdSet, a) verifies whether a is a valid multisignature 
on message to on behalf of the set of the identities IdSet. 

In the random oracle model (ROM), KeyDer, MSign and Vrfy procedures addi¬ 
tionally have access to a random oracle H(-) : {0,1} —> D , where D depends 
on the scheme. This set of procedures must satisfy the following completeness 
properties: For any integer n, any message to, and any (mpk, msk) output by 
Setup(l' t ), if for i = l..n, one obtains skid t <— KeyDer (msk,Idi) and correctly 
follows MSign on input to using secret keys skid t , then assuming all messages are 
delivered between players, each player outputs the same string er which satisfies 
Vrfy (mpk, to, {Id\, ..., Id n }, a) = 1. 

Security Notion of an IBMS Scheme. We model the security as existential 
unforgeability under an adaptive chosen message and adaptive chosen identity 
attack: The adversary participates in a game in which it issues a number of 
key derivation and signature queries. In a key derivation query, the adversary 
corrupts a player by submitting its identity Id to the key derivation oracle and 
receiving its secret key skid■ hr a signature query the adversary specifies the 
message to and the identity Id that it wants to interact with; and the signing 
oracle performs MSign protocol on message to on behalf of Id. The adversary 
wins the game if it eventually outputs a message m, a multisignature a and a set 
of identities IdSet s.t. Vrfy(mpfc, m, IdSet, a) = 1 and there exists an identity 
Id s.t., the adversary never queried the key derivation oracle on Id and never 
queried the signing oracle on (to,I d). More formally we define the adversarial 
advantage of A against IBMS = (Setup, KeyDer, MSign, Vrfy) as a probability 
that experiment Exp^s™ 3 (M) described in Figure [5] outputs 1 i.e. 

Ad v iBMS (“d) = -Rr [ExpiBMs (-d) = 1] 

where the probability goes over the random coins of the adversary and all the 
randomness used in the experiment. We call an IBMS scheme (t,c,n,qK,qs)~ 
secure if Adv^i^g" 13 )^) < e for every adversary A that runs in time at most t, 
makes at most qx key derivation queries and at most q$ signature queries, and 
produces a forgery on behalf of at most n parties. In the random oracle model 
we extend this notion to (t, e, n, qx , Qs, <?_f/)-security, where A is additionally 
restricted to at most qa hash queries and the probability in the experiment 
Exp[ J g|() l 5 ma (Al) goes also over random choice of a hash function. 
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Experiment (A) 

— ( mpk, msk) <— Setup(l" ; ); MldLst <— 0; CldLst +— 0; 

— Run A(mpk), and handle M’s key derivation and signature queries as follows: 

— On a key derivation query on identity Id, add Id to CldLst, run KeyDer on input 
(msk, Id) and return skid to A. 

— On a signing query on pair ( m,Id ), add ( m,Id ) to MldLst, run MSign protocol 
on behalf of identity Id on message m forwarding messages to and from A. 

— When A halts, parse its output as (m, IdSet, a). 

— If (Vrfy(mpfc, m, IdSet, <t) = 1)A(3 IdGldSet s.t. (Jd^CldLst)A((m, Id)^ MldLst)) 
then return 1 , otherwise return 0. 


Fig. 2. Chosen Message Attack against an Identity-Based Multisignature Scheme 


4 X'-Equivocable Commitments and Structured-Instance 
Zero-Knowledge 

Homomorphic 17-Protocols, 47-protocol, a notion introduced by Cramer, 
Damgard and Schoenmakers |CDS94j . is a three-move proof system with spe¬ 
cial honest-verifier zero-knowledge (HVZK) and strong soundness properties. Let 
R = {(x,y)} be a relation whose membership can be verified in polynomial time. 
We consider a special case where X and Y are algebraic groups (for notational 
simplicity we use multiplicative notation for both), and R = {(a;, f(x)) \ x £ X} 
where / : X —> Y is a homomorphic one-way function. We consider a proof of 
knowledge system for relation R which we call homomorphic X-protocol (for R): 
The prover, on input x £ X, sends a = f(k) where k <— X. The verifier, on input 
y £ Y, creates a challenge c as a random K-bit string, and the prover responds 
with z = kx c . The verifier accepts iff f(z) = ay c . This is a form of several 47- 
protocols for known homomorphic one-way functions, e.g. Guillou-Quisquater 
identification scheme |GQ88] for a power function f e , n (x) = x e mod n and 
Schnorr’s scheme |Sch89] for exponentiation f g , p (x) = g x mod p. The special 
HVZK property of a 47-protocol says that there exists an efficient simulator 
which on input y computes pair (a, z) for any c with the distribution matching 
that of the prover. The special strong soundness says that there exists an effi¬ 
cient extractor which computes witness x s.t. (x, y) £ R for any y from any pair 
of accepting conversations (a, c, z) and (a, d , z') s.t. c ^ d. 

Structured-Instance Zero-Knowledge. Multi-instance zero-knowledge (ZK) 
( a.k.a. multi-theorem ZK) in common reference string (CRS) model requires a 
two-phase probabilistic poly-time simulator s.t. (1) in the first phase, given public 
parameters, the simulator outputs the CRS string together with some trapdoor 
information; (2) In the second phase, given a statement and the trapdoor, simu¬ 
lator outputs the simulated proof for that statement. In the single-instance ZK, 
the simulator knows the statement beforehand and can set the CRS string as a 
function of this particular statement. Structured instance zero-knowledge proof 
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for relation R introduced above is an intermediary notion: The simulator is given 
a “core statement” y £ Y before it sets the CRS string, and then it can simulate 
the proof for statement y = y ■ f(5 ) for any S £ X. Here is the formal definition: 

Definition 1. Let X and Y be algebraic groups and f : X —> Y be a surjective 
homomorphic one-way function, all indexed by a public parameter par. Let 77 = 
( G, V , V) be a proof system in CRS model for relation R = {(x,y) £ X x Y\ y = 
f(x)} where Q is an algorithm that outputs the common reference string. We 
say that 77 is straight-line e-structured-instance zero-knowledge if there exist 
efficient algorithms S±,S -2 s.t. S i on input par and a core instance y £Y, outputs 
the CRS string a and trapdoor td, while S 2 on input td and a “witness-shift” 
6 £ X outputs a simulated proof tt for instance y = yf{8), and for all (x,y) £ 
X xY s.t. f(x) = y the following two properties hold: 

1. Statistical difference between the following two distributions is at most e: 

{a | (a, td) <- Si(par,y)} 

W I o- <- G( par)} 

2. V verifier V* and \/S £ X, the following two distributions are identical: 

{tt | tt <- V*(y, o-^W.cr). a ) <- <Si(par, y); y <- yf(S)} 

{7r | 7r 4— V*(y, a) v ^ x ' v ' <T ^\ a <— G( par); y <— yf(S); x <— 75} 

Commitment Schemes. A commitment scheme C in the CRS model consists of 
probabilistic poly-time algorithms CSetup, CKG, Com and Open. CSetup on input 
the security parameter k, generates public parameters cpar, which also determine 
the commitment message space A4. CKG(cpar) generates the commitment key 
K , Com^-(m) generates the commitment C and the decommitment D on message 
m £ A4, and finally Open k{C, D, to) determines if D is a valid decommitment of 
commitment C to message to. A commitment scheme must satisfy that if cpar <— 
CSetup(l K ), K 4— CKG(cpar), and ( C,D ) 4- Comx(ro), then Open K (C, 77, m) = 
1. Below we define statistical hiding and computational binding properties of 
commitments because these will be variants of these notions which our scheme 
satisfies. 

e-Hiding: For all cpar <— CSetup(l K ), mo,mi £ Af, and K 4— CKG(cpar), there 
is less than e statistical difference between the distribution of C’s output by 
Comjf(mo) and the distribution of C’s output by Comx(mi). A commitment 
scheme is perfectly hiding if e = 0. 

(t, e)-Binding: For any algorithm A running in time t and any cpar output by 
CSetup(l K ), the probability of Open^ (C, D 0 , mo) = Open K (C, Di, mi) = 1 and 
Too yf mi is less than e where (C, Do, D\,mo, mf) is outputted by A on input K 
and K 4— CKG(cpar) and probability is over the coins of CKG and A. 

Notation: In this paper we only deal with the commitment schemes in which the 
commitment is a deterministic function of the message and the decommitment. 
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Therefore we assume there exist a decommitment space denoted as TZ and the 
Com procedure picks decommitment D <— TZ and computes the commitment C 
as the deterministic function of m and D. 

E- Equivocable Commitments. A commitment scheme is equivocable if there 
exists an efficient simulator that generates commitment key K , indistinguishable 
from real key, together with a trapdoor td. The trapdoor allows simulator to cre¬ 
ate fake commitments indistinguishable from real ones, and later decommit them 
to any message. Using equivocable commitments, one can compile a U-protocol 
to a multi-instance ZK proof system with straight-line simulation |DamOO] . Here 
we define a rather restrictive form of equivocability called U-equivocability and 
we show that it is sufficient for compiling 17-protocols into structured-instance 
ZK proofs with straight-line simulation. It turns out that structured-instance ZK 
is sufficient for our application of ZK proofs to multi-/aggregate signatures and 
multi-instance ZK is not required. Moreover the straight-line simulatability of 
this system allows us to have multi-/aggregate schemes with concurrency, better 
exact security and with improved round complexity. 


Definition 2. Let X and Y be algebraic groups and let f : X Y be a ho¬ 
momorphic one-way function, all indexed by a commitment parameter cpar. We 
call a commitment scheme e-U-equivocable for / if there exist probabilistic poly¬ 
time algorithms tdCKG, tdCom, and RstEquiv, where ( K,td ) <— tdCKG(cpar, y), 
(■ C,st ) <— tdCorriif (td), and ( D,z ) <— RstEqui m K {td, st,c,8), s.t. for any cpar 
output by CSetup and any y £Y the following properties hold: 

1. There is at most e statistical difference between the distribution of K’s output 
by CKG(cpar) and K's output by tdCKG(cpar, y). 

2. For all ( K,td ) <— tdCKG(cpar, y), S £ X, and c £ {0,1} K , if (C,st) is output 
by tdComx(td) and ( D,z ) is output by RstEquiv^ (td, st, c, ft) then D is dis¬ 
tributed as random decommitment in 7 Z and Open^(C, D , f {z)(y f (5))~ c ) = 1. 

Intuitively definition [2] says that the equivocation procedure, given ( y,c,S ), can 
open a fake commitment to a message of the form a = f(z)(yf(5)) for some 
z. This is useful in straight-line simulation of a proof of knowledge for relation 
R = {(a :,y) £ X x Y\ y = f(x)}. For example, let / : QR„ —> QR„ where 
f(z) = z e (mod n). Consider the HVZK simulator of the 17-protocol for prov¬ 
ing knowledge of e-th root: This simulator picks random c and z and computes 
prover’s first message a = z e y~ c . Below we show that Damgard’s compilation 
(DamOO) (see Figure [3] below) transforms such U-protocol to structured-instance 
zero-knowledge using only such A-equivocable commitments, because the simu¬ 
lator can output a fake commitment and then open it to what the U-protocol 
simulator would output as the prover’s first message i.e. a = z e y~ c . Definition [5] 
implies that a fake commitment can be opened to a = z e (y5 e ) for any 6 and 
c. Hence the structured-instance zero-knowledge simulator can use this property 
to simulate a proof for any instance y = yd e where y is set before the simulator 
creates the CRS string (see theorem Q]). 
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Common Reference String: 

Commitment Key K of T-Equivocable Commitment Scheme 

Prover P(x) s.t. x € X, f(x) = y 

Verifier V(y) s.t. y € Y 

kC- X,a<- f(k) 


(C, D) <— Comjf(a) -—- 


C 


z , D 


z <— kx c - 

s- acc iff Open K (C,D,f(z)y c ) = 1 


Fig. 3. Straight-line simulatable structured-instance ZKPK of pre-image of / 


Homomorphic Commitments. We call a commitment scheme multiplica- 
tively homomorphic if there are efficiently computable operations © and © s.t. if 
Open if ( Ci, Hi, mi) = 1 and Open x (C 2 , H 2 , m 2 ) = 1, then Open K (C, D, m) = 1 
for C = Ci © C 2 , D = D i © D 2 , and m = mim 2 . Accordingly, a commitment 
scheme is /-restricted multiplicatively homomorphic if the homomorphic oper¬ 
ation can be applied on only / commitment-decommitment pairs generated by 
Com procedure. Our construction is /-restricted multiplicatively homomorphic. 

Structured-Instance Zero-Knowledge from Homomorphic 17-Protocol. 

Figure [3] shows a construction of a straight-line simulatable structured-instance 
zero-knowledge proof of knowledge system, in the CRS model, from homomor¬ 
phic 17-protocol and 17-equivocable commitment. This is an identical construc¬ 
tion to Damgard’s compiler from 17-protocol to ZKPK proof [DamOPj . Below 
we show that using only 17-equivocable commitments the same compilation pro¬ 
duces structured-instance zero-knowledge proof given homomorphic 17-protocol. 
As in [DamOO : the resulting protocol is an argument of knowledge, subject to 
the binding property of the commitment scheme. 

Theorem 1. Let X and Y be algebraic groups, f : X <— Y a homomorphic 
one-way function, C a X-equivocable commitment over message space A4 C Y. 
Then the protocol in figure [3 is a straight-line simulatable structured-instance 
zero-knowledge proof of knowledge of pre-image of f in the CRS model. 

Proof. The straight-line simulator S = (5i,5 2 ), for structured-instance zero- 
knowledge proof acts as follows: In the first phase, given cpar and y £ Y, S\ 
runs tdCKG(cpar, y) to obtain (td, K) and sets the common reference string er as 
K. In the second phase, given td and witness shift 6 G X, <S 2 runs tdComA-(fd) 
to obtain the fake commitment C and state st and sends C to the verifier. Upon 
receiving the challenge c from the verifier, <S 2 runs RstEqui v K (td,st,y,6) to get 
the response z and fake commitment D. According to A-equivocability property 
(definition [5]) it immediately follows that S satisfies conditions in definition U 
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5 Aggregatable Zero-Knowledge Proof of Knowledge of 
e-th Root 

Safe RSA Assumption. Since our construction relies on two related instances 
of RSA cryptosystems which share same RSA modulus n but use two different 
public exponents e and e !, it is convenient for us to use the following notation for 
RSA instance generation: We call an algorithm KG s rsa a safe RSA generator if 
on input security parameter k and a prime e s.t. 2 K < e < 2 2k , KG s rsa generates 
a pair (n, d) where ( 1 ) n = pq s.t. p = 2 p' + 1 , q = 2q' + 1 and p, q , p' and q' are 
all prime numbers s.t. \p'\ = \q'\ and p',q' > 2 2k and (2) d = e _1 mod </>(n). For 
later use we define n 1 = p'q' ■ The advantage of an algorithm A in breaking the 
RSA(e) problem is defined as 

Adv KG'sR S S AA,e( K ) = = y I (n> cOAkG s rsa(k, e); yA- Z*; x<AA(n, e, y)\ ( 1 ) 

We say algorithm A , (f, e)-breaks the RSA(e) problem on security parameter k 
if A runs in time at most t and AdvKG^ s S A A ^, e ( K ) > e - We say that the RSA(e) 
problem is ( t , e)-hard (for security parameter k) if no algorithm A , (f, e)-breaks 
it. We note that the requirement that p',q' > 2 2k is just a lower-bound we 
introduce to enable any party to choose “secondary” public exponent e' s.t. 
gcd(e / , 0 (n)) = 1 and e! > le where l is a maximum number of participants in 
any single instance of the multi-signature scheme. 

5.1 RSA-Based Multiplicatively Homomorphic A-Equivocable 
Commitment 

Let e and e! be two prime numbers s.t. 2 K < e, e! < 2 2k and e < e’/l for 
some integer l and let (n, d) be output by KG s rsa(k, e). This assures that both 
(n, e) and ( n , e') are safe RSA instances. We describe an efficient commitment 
scheme, which is computationally binding under the RSA(e / ) assumption, has l- 
restricted multiplicatively homomorphic property on message space M. = QR„, 
and is if-equivocable for f(x) = x e (mod n). Curiously, this commitment is 
statistically hiding only for the messages picked from a specific subset of the 
message space, but in our application of this commitment scheme to straight 
line simulatable ZKPK of e-th root, standard hiding property is not necessary, 
and JC-equivocability property for the above function is sufficient. 

CSetup(ft): Pick prime numbers e and e! s.t. 2 K < e,e' < 2 2k and e < e'/l. 
Run KG 5 rsa on input (k, e) to obtain ( n,d ). Set cpar <— (n,e, e'). 

CKG (n,e,e'): Pick h A- QR n and set K <— (n,e,e',h). Note that it is easy 
to sample random elements in QR„ by squaring a random element in Z*. 
ComA-(m): Pick r <— Z e and set C <— h r m e and D <— r . (Hence the 
decommitment space is Z e .) 

Open k{C, r, to): Accept iff C = h r m e and 0 <r < e!. 

tdCKG((n, e, e'), y): Pick 7 A- [n], and set h <— ( y) 7e , K <— (n,e,e',h), and 
td <- ( 7 ,y). 
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tdCorriif (td): Pick s and return (C, st) where C = ( y) e s and st = s. 

RstEqui \/ K (td,st,c,S): Compute r = (s + c) 7 _1 (mod e) and i = (s + c — 
r yr)/e (over integers) and return (r, z) where z = (y) l (S) c . 

Statistical Hiding. This commitment scheme is e-hiding for the messages 
picked from M. C QR n where M. = {h l ( e ) |i £ [ee/2]} and h is determined by 

the commitment key. To argue this note that the maximum statistical difference 
between the distributions of the commitments to mo, mi £ A4 happens when 
they correspond to i = 0 and i = ee/2 respectively. This way the distributions 
of the commitments would be {h r } r ^ e ^ and {h r+ee ^ 2 } r ^^ respectively which 
has a statistical difference equal to e. 

Computational Binding. This commitment scheme is (t, e)-binding if RSA(e') 
problem is (t, e)-hard. Indeed given the challenge (n,e',h), one can use the at¬ 
tacker on binding to find the e'-th root of h. The reduction runs the binding at¬ 
tacker to obtain (C,r,m,r’,m') s.t. Open K (C,r,m) = Open K (C,r',m') = 1 and 
to ^ m!. Since C = h r m e = h r m ,e it follows that h r ~ r = (to'/to) 6 . Now since 
r, r' < e / , then gcd(e , ; j— r') = 1 and using extended Euclidian algorithm one can 
compute a, (3 s.t. a(r — r') + (3e' = 1. Thus h = h a ( r ~ r ')+0 e ' = [{rn'/m) a h^) e ' 
and e 7 -th root of h can be computed as (m'/m) a hP. 

/-Restricted Multiplicative Homomorphism. This commitment scheme is 
multiplicatively homomorphic on QR„ in the sense that up to l < |_e / /ej mes¬ 
sages can be combined: If {(Ci, ?y)}i = i..z are commitment-decommitment pairs 
for messages mi,..., mi £ QR ra each computed by the commitment procedure, 
then r = X^i=i r i( over integers) is a valid decommitment for commitment C = 
nUi & f° r message to = n!-i m *- Note that by setting e' > e2 K , homomor¬ 
phism can be used on any feasible set of messages. 

A-Equivocability. This commitment scheme is 2 - 2 K -i7-equivocable for func¬ 
tion (family) f( n ^(x) = x e (mod n). First note that for every (n,e,e') output 
by CSetup and every y £ QR ra s.t. y is a generator of QR„, the distributions of 
keys generated by CKG(n, e, e') and tdCKG((n, e, e'), y) are at most 2 -2k apart, 
because CKG chooses the key h as a random element in QR„ while tdCKG picks 
h = (y) e 7 for e' s.t. gcd (e',(f>{n)) = 1 and 7 chosen at random in [n]. More¬ 
over the statistical difference between [n] and [4n'] is equal to 1 — An 1 /n < 2 2k . 
Secondly, if y is a generator of QR n then for every 7 £ [n], every 6 £ QR n 
and every c £ {0,1} K , according to the code of tdCom and RstEquiv, r, z satisfy 
s+c = 77 '+ie and z = (y) l (S) c , therefore for m = z e (y(5 ) e ) c we have C = h r m e , 
and hence Open(C,r, to) = 1. Moreover the distribution of the decommitments 
in the equivocation process i.e. {r|s <— Z e ; f <— (s + c) 7 _1 (mod e)} is identical 
to uniform distribution over Z e . 

Corollary 1. Consider prime number 2 K < e < 2 2k and let n be a safe RSA 
modulus output by KG s rsa on input e and security parameter k. Consider com¬ 
pilation shown in figure 0 and let the function (family) f be f( n ,e) '■ QRn ~^ QRn 
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s.t. f(n,e)( x ) = x& (mod n) and let the compilation be instantiated with the 
commitment scheme described in this section. Then from theorem [7J it imme¬ 
diately follows that the resulting scheme is a straight-line structured-instance 
zero-knowledge proof of knowledge of e-th root. 


6 Identity-Based Multisignature Scheme Based on RSA 

We describe our IBMS scheme based on the RSA assumption. The scheme takes two 
communication rounds, requires two double-exponentiations per party for signing 
and one triple-exponentiation for verification. The scheme is based on the GQ ID- 
based identification protocol |GQ88| , which is the 17-protocol for proving knowl¬ 
edge of e-th root. Each party simply executes the aggregatable zero-knowledge 
proof of e-th root of its (hashed) identity string, using the straight-line simulat- 
able aggregatable ZKPK of e-th root described in Section[5] Figure|3]contains the 
Setup, KeyDer, MSign and Vrfy algorithms for this IBMS scheme. 

Note on multi-signature length. In Figure ED t he final multi-signature is a tu¬ 
ple ( z,C,D ) where z £ Z* and (C, D) £ Z* x Z e is a commitment-decommitment 
pair on message a = z e (y) . However this commitment can be computed as a 
deterministic function of the committed message a and the decommitment D 
(see Section U). Therefore C can be computed given (z, c, D ), and hence one can 
use (z, c, D ) as the final multi-signature, which reduces the multi-signature size 
to |Z* | + |Z e | + k < \n\ + 2k + log l. 

Theorem 2. If RSA(e) and RSA(e ') problems are (t',e')-hard, and the IBMS 
scheme in figure\Qis instantiated with commitment scheme in section 0 which is 
(t B , cb)- binding and CE-S-equivocable for function f( n , e ){x) = a; 6 (mod n), then 
the resulting IBMS scheme is ( t, e,n, qk,q s , qh)~secure in random oracle model 
where 

t i min(t', t B ) - (3g s + qh)t e x P 

e < 4 q k \j (e' + e B + e E )qh + (^Ti) + |^ZT + 
and t exp is the time of one exponentiation in Z*. 


Proof. Let C = (CKG, Com, Open, tdCKG, tdCom, RstEquiv) be a commitment 
scheme for public parameters cpar = (n, e, e') and the message space M. equal to 
QR„. Assume C is /-restricted multiplicatively homomorphic, (ts, es)-binding 
and es-^7-equivocable for f( e ,n)( x ) = x e (mod n). Given a (t,e,n,qk,q s ,Qh)- 
forger J 7 , consider two simulators Bo and B\ that simulate the role of the honest 
player as in the experiment ExpEg^j 1113 interacting with the forger T. Bq takes 
as an input a set {ci, C 2 , c qh } where cfs are in {0,1} K and runs Setup proce¬ 
dure to obtain ( mpk,msk ) and follows the real protocol i.e. answers to forger’s 
key derivation queries and signing queries using procedures KeyDer and MSign 
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1. Setup(l K ): 

Let l be the maximum number of players in the IBMS scheme. Pick prime numbers 
e and e' s.t. 2 K <e,e'< 2 2k and 2 K+1 l < e < e'/l. Run KG s rsa on input (k, e) to 
obtain ( n,d ). Note that gcd (e',(j>(n)) = 1 because (j>(n) = Ap'q' where p',q' > 2 2k . 
Run CKG (n,e,e') to obtain the commitment key K. Set mpk = (n,e,e',K) and 
msk = d. Assume Hi : {0,1}* —> QR„ and H 2 : QR n x {0,1}* x QR„ — > {0,1} K 
are random oracles that every other algorithm in the protocol has access to them. 

2. KeyDer(msfc, Id): 

The PKG computes Xid <— (Hi(Id)) 2 (mod n), sets the private key of the user 
with identity Id as skid <— xid and sends it back to him via a secure and 
authenticated channel. 

3. MSign: Let V be the set of players participating in the protocol. Each player 
determines V after the first step of MSign. Player with identity Idi on input 
(mpk,m, skidi), performs the following steps: 

3.1 Pick ki i— QR„, a,i <— kf; Set ( Ci,Di ) <— ComA'(ai) and broadcast ( Idi,Ci)\ 

3.2 Upon receiving ( Idj,Cj ) VP, 6 V, Set IdSet *— {Idj}p jeP and C <— &) P ( z V Cj; 
Set c <— H 2 (C, IdSet, m); Compute Zi <— ki(xidi) c and broadcast ( Zi,Di)\ 

3.3 Output multisignature a = ( z,C,D ), where z = ]~[ Zj and D = 0 Dj. 

PjGV PjZP 

4. Vrfy (mpk, m, IdSet, a): 

Parse a as (z,C, D ) and mpk as (n, e, e !, K); Set c <— H 2 (C, IdSet, m); 

V Th di eidSet Hi(Idi) 2 \ If Open^(C, D, z e y c ) = 1 then accept otherwise reject. 


Fig. 4. Identity-based multisignature scheme based on RSA 


respectively. Additionally, Bo answers the forger’s hash queries and performs an 
extra finalization process by following the procedures SimHash and Finalize in 
Figure [5j The simulator B\, on the other hand, takes as an input an RSA chal¬ 
lenge ( n,e,y ) and a set {ci,C2,..., c Qh } where Cj’s are in {0,1} K and follows the 
Init, SimKeyDer, SimMSign, SimHash and Finalize procedures detailed in Figure 
[5] to perform the initialization, answering to key derivation, signing and hash 
queries and finalization processes, respectively. Intuitively, the simulator B\ uses 
Coron’s technique |Cor00| to embed the RSA challenge in the hashes of the ID’s 
of the players with some biased probability 1 — p hoping that the forgery be 
based upon the ID of the player for which the RSA challenge is indeed embed¬ 
ded. This way B\ passes the signing queries on behalf of identity Id just like 
real protocol using the procedure MSign if the RSA challenge is not embedded 
in the hash of Id and otherwise B\ uses the straight-line structured-instance 
zero-knowledge simulator for proof of knowledge of e-th root (see corollary [l]) to 
simulate the signature protocol on behalf of the identity Id. Both Bq and B\, 
after receiving a valid forgery from T, perform a finalization phase in which the 
forged multisignature is returned together with the index of the hash responses 
upon which they are based. Namely both Bo and B\ return (j, (m, IdSet, a)) 
s.t. Vrfy(?npfc, m, IdSet, a) = 1 and there exists at least one uncorrupted Id s.t. 
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(to, Id) is never queried for signing. The simulators Bq and B\ set up empty 
tables Hi and H 2 to simulate the hash functions H\ and H 2 respectively and use 
the set {ci, C 2 , c qh } to answer to the hash queries to H 2 which enables the 
utilization of forking lemma (as formulated e.g. in [BN06 BCJ08 ). 

Now for / G {0,1} let’s lower-bound accg, the probability that £>/ generates a 
“useful” output i.e. an output other than (0, A). This happens when Bi does not 
abort in any of the key derivation queries or finalization procedure. Therefore 
acc Bl > p qK { 1 — p). This function reaches its maximum when p = Qk/Wk + 1)- 
Substituting this value of p yields: 


accgj > 


IK 

qK + 1 


<7K 



IK \ 
QK + 1 ) 



„ \ QK+1 

Ik \ 

Qk + 1 ) 



For I G {0,1}, consider ^Fg^-the forking algorithm associated with £>/. The suc¬ 
cess event of T Bi denoted by E Bi is that the algorithm Bi outputs two tuples 
(cj, {x,n\,m, IdSet, a)) and (cj, (x, n}, to, IdSet, d)) s.t. Cj 7 ^ Cj where j is the 
index of the hash responses upon which the forged multisignature is based. Since 
the random coins of the algorithm £>/ and the hash responses of the algorithm 
Bj previous to j th query are the same in the first and second executions, all 
the computations and communications and in particular the queries submit¬ 
ted to the hash function H 2 before j th query must be the same, too. Thus 
the occurrence of E Bi implies IdSet = IdSet, C = C and to. = to. Note that 
IdSet = IdSet also implies y = y. This is because y — n idi GldSet (HM)) 2 , 
V = UideidSeti^M)) 2 an d the values for Hi(Idi) for all Idi £ IdSet is 
fixed before the fork. The success event E Bi can be partitioned into two cases 
(1) event E Bi in which E Bi happens and z e y~ c = z e y~ c (2) event E Bi in 
which E Bi happens and z e y~ c 7 ^ z e y~ c . Obviously E Bi = Ef 1 U E^ 1 and hence 
Pr [E Bl ] < Pr[iff 7 ] + Pt[E B ']. On the other hand, according to the forking 
lemma, E Bi can be lower bounded by es r , the success probability of the simu¬ 
lator Bi: 


acc Bl . < Pr [E Bi ] < Pr [E* 1 ] + Pr [E* 1 ] (2) 


If cd s are uniformly distributed in {0,1} K then .F’s view in interaction with Bo 
is identical to the real execution of the protocol. As for B\, since C is e_E-TA 
equivocable, by straight-line structured-instance simulatability of ZKPK of e-th 
root, firstly the distributions of the commitment keys in the simulation and in 
the real protocol are at most eg apart and secondly the distribution of the tuples 
{C\,Di,z\) generated in each signature instance in the interaction between T 
and B\ is identical the distributions of the same variables in the real execution. 
Thus, since our simulation is straight line, total distance between .F’s view in 
interaction with B\ and in real execution is at most eg. This implies in particular 
that eg 0 = e, \e Bl — e| < eg and | Pr [E B °] — Pr^E® 1 ]! < eg. So e/4gfc < accg 0 and 
(e — eg)/4qfc < accg 0 . Thus equation © becomes: 


e-e E f e - eg 
4 qk V 


4 qkqh 2 ' 


<Pi[E Bl )+Pi[E Bo ] + e E 


(3) 
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Init(n, e, y): 

Pick prime e! where el < e! < 2 2k and 
run tdCKG ((n,e,e'),y) to get ( td,K ), 
set mpk as (n,e,e', K) and run T on 
input mpk-, 

SimKeyDer(Jd): 

Query Hi on Id and look up Hi [Id] to 
get ( b,5,y ). If b = 0, return 6 other¬ 
wise abort the simulation with failure 
outputting (0, A). 

SimMSign(m, Id): 

Query Hi on Id and look up Hi[/d] 
to get ( b,5,y). If (b = 0) then run 
MSign(m, Id)-, otherwise: 

-(C, st)<— tdComjr(td); 

Send ( Id,C) to T-, 

—Upon receiving ( Idj,Cj) for Pj £V, 
IdSet <— {7dj}p 3 gp; C <— ® P & -p Cj\ 
c *— Hi(C, IdSet, m ); 

( D,z) <— RstEquiv^ (td, st, c, <5); 

Send ( z, D) to T\ 

~z^~ Wp.^-p Zj-, p^.gp Dj\ 

Output a = ( z,C, D)-, 


SimHash: 

Hi(Id): If Id is not previously queried, 
pick 5 uniformly at random from QR„, 
toss a biased coin b so that b — 0 with 
probability p and 6=1 with probability 
1 — p. If 6 = 0, set y <— 1 5 e otherwise 
set y <— y5 e . Store (b,d,y) to Hi[/d]. 
Return Hi[/d]. 

Hi(C, IdSet, m): If (C, IdSet, m) is an 
i th distinct query of T to H 2 , then 
query Hi(Idi) for every Idi £ IdSet 
and set H 2 [{C, IdSet, m)] <— cp, Return 
H 2 [(C, IdSet, m)]; 

Finalize: 

Upon receiving a valid forgery 
(m, IdSet, a) from T, parse u as 
( z,C,D ) and query H 2 on (C, IdSet, m). 
Let IdSeto = {Idi\bi = 0} and 

IdSeti = {Idi\bi = 1}. If IdSeti = 0 
then abort the simulation with fail¬ 
ure outputting (0, A). Otherwise set 
x <- Ui di eidSet( x i)i ni = \IdSeti\ and 
return (j, (x,ni,m, IdSet, a)) where j 
is the index of c in the hash table H 2 . 


Fig. 5. The procedures SimHash and Finalize that Bo and B 1 use and the procedures 
Init, SimKeyDer, and SimMSign that Bi uses 

The actual reduction algorithm 1Z, runs both Ts a and If Ei Bl happens, 
then z e y~ Cj = z e y~ Cj . Substituting y = y = ( y) 2ni x 2e where ni is the number of 
players for whom the reduction has embedded the challenge (see figure 0 yields 

((z/~z)x 2 ^-^y = (y) 2n ^~^ (4) 

Now since 12 K+1 < e, therefore gcd(e, ni(cj-Cj)) = 1 and one can easily compute 
the e-th root of y using the extended Euclidean algorithm. 

If E 2 Bo happens, then 1Z immediately translates it into an attack against bind¬ 
ing property of commitment scheme C by returning (c, D, D, z e y~ Cj ,z e y~ Cj ). 
To see this note that as argued before, y = y, C = C and since E^ 0 is oc¬ 
curred, thus z e y~ Cj ^ z e y~ Cj and due to validity of the forgeries we have 
Open K (C, D, z e y~ Cj ) = Open K {C,D,z e y~ Cj ) = 1. Moreover the commitment 
key K is outputted by CKG in the execution of Bq. Thus Prfijf 1 ] < e' and 
Pr [E b °] < cb and hence equation (0 becomes 

e - e E ( e - e E _1 

4 qk V 


4 qkqh 2' 


< e 1 + es + £e 


(5) 
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The running time t B of the reduction algorithm 1Z is twice the maximum of running 
time of the algorithms Bq and B±. But the running time of £>o and B± is dominated 
by the running time of the forger T plus the time spent by the simulators to answer 
the hash, signing and key derivation queries. Thus t B < 2 (t+ (3 q s + qh)t e xp) where 
t exp is the time required for exponentiation in Z* . On the other hand since TZ either 
answers the RSA challenge or returns an attack against the binding property of the 
commitment C , it must be true that min^', t B ) < t B . Thus: 

t > ^ min(t / , t B ) - (3 q s + qh)t e x P 

7 Identity-Based Aggregate Signature Scheme 

The construction in the previous section can be easily modified to obtain a 2- 
round identity based aggregate signature (IBAS) scheme provably secure under 
RSA assumption. For this purpose, one needs to modify the verification algo¬ 
rithm to support the case where different challenges are acquired in step 4.2 
of the protocol due to querying H 2 on different messages. More precisely, the 
resulting IBAS scheme is exactly the same as the scheme described in figure H] 
except that its verification algorithm would be as follows: Parse a as ( z,C,D ) 
and mpk as (n,e,e',A); Compute R <— z e Ylid eidSet (7ti(Idi)) C * where Cj is 
output of 2 on input (C, IdSet, mi) and check whether Open^(C, D, R ) = 1. 

The security proof for this IBAS scheme is similar to the proof given in the 
previous section. Namely the reduction runs two simulators; in one simulator the 
challenge is embedded in the commitment key and in the other it is embedded in 
hashes of IDs. Therefore with high probability, if the forgery happens the reduc¬ 
tion translates it either to either attack the binding property of the commitment 
scheme (event E 2 in the previous proof) or to find e-th root of the challenge 
(event E\ in the previous proof). The security proof of the IBAS scheme is sim¬ 
ilar to the security proof of IBMS scheme described in the previous section. The 
most important difference is that in order to find the e-th root of the challenge 
we have the following equation instead of equation H] in the previous proof: 



Therefore to be able to compute e-th root of y, we need gcd(e, 2 Y^id eidSet (^~ 
Cj)) = 1. In particular, the reduction succeeds as long as '^2id i eidSet 1 ^ i ~ Ci ) 7^ 
0 mod e, i.e. unless the challenges in the two branches of the forking algorithm 
sum up to the same value mod e, which happens with only negligible probability. 
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Abstract. We propose a framework for adaptive security from hard 
random lattices in the standard model. Our approach borrows from the 
recent Agrawal-Boneh-Boyen families of lattices, which can admit reliable 
and punctured trapdoors, respectively used in reality and in simulation. 
We extend this idea to make the simulation trapdoors cancel not for a 
specific forgery but on a non-negligible subset of the possible challenges. 
Conceptually, we build a compactly representable, large family of input- 
dependent “mixture” lattices, set up with trapdoors that “vanish” for a 
secret subset which we hope the forger will target. Technically, we tweak 
the lattice structure to achieve “naturally nice” distributions for arbitrary 
choices of subset size. The framework is very general. Here we obtain fully 
secure signatures, and also IBE, that are compact, simple, and elegant. 


1 Introduction 

Lattices are currently enjoying renewed interest in cryptography, owing to a 
combination of mathematical elegance, implementation simplicity, provable se¬ 
curity reductions, and, more recently, rather dramatic gains in efficiency that 
bring them closer to the familiar discrete-log and factoring-based approaches. 
Lattice-based crypto also offers the hope of withstanding quantum computers, 
against which both discrete-log and factoring-based approaches are known to be 
utterly defenseless. As a few examples of influential lattice-based cryptosystems 
and foundations, we mention |hlhl1 VII ftl4l23l22ITn] . among many others. 

Still, by far the biggest barrier to the practical deployment of lattice-based 
cryptographic systems remains their space inefficiency, which may exceed by 
several orders of magnitude that of the mainstream. This is especially true for 
systems based on so-called “hard” random integer lattices, which have essentially 
no structure other than being periodic modulo the same modulus q along every 
coordinate axis. Hard lattices have the drawback of requiring voluminous repre¬ 
sentations, especially when compared to lattices with additional structure such 
as cyclic or ideal lattices. Being devoid of structure, however, hard lattices may 
harbor potentially tougher “hard problems” for a safer foundation for crypto. 


-517, 


P.Q. Nguyen and D. Pointcheval (Eds.): PKC 2010, LNCS 6056, pp. 499 
(c) International Association for Cryptologic Research 2010 


2010. 





500 X. Boyen 


A primary motivation for lattice cryptography being a hedge against the 
doomsday of mainstream assumptions, it seems worthwhile to endeavor to build 
cryptosystems as efficient and provably secure as we can from hard lattices. 


1.1 Related Work 

A number of progresses toward efficient lattice-based signatures have recently 
been made. We mention the most closely comparable ones to this work. 

Lyubashevsky and Micciancio |16| gave an elegant one-time signature on cyclic 
lattices, that was then lifted into a many-time signature using a standard tree 
construction. The signature was stateful and not truly hash-and-sign. 

Gentry et al. |l.'i| were the first to realize identity-based encryption from (hard) 
lattices, with a fully secure construction that implied a very efficient signature 
as a by-product. Their security proof crucially relied on random oracles. 

Cash et al. ini and Peikert [2T] then managed to remove the random oracle 
and add a hierarchy, using an elegant but bandwidth-intensive bit-by-bit scheme 
(also concurrently proposed by Agrawal and Boyen [3] sans hierarchy), reminis¬ 
cent of Canetti et al. |T0]. Additionally, Peikert |2T| showed how to make a simpler 
signature from the bit-by-bit framework, albeit with an IBE-precluding “salt”, 
using the recent prefix signature technique of Hohenberger and Waters M- 
Boneh et al. m soon thereafter showed how to avoid the bit-by-bit IBE 
construction in favor of a compact and efficient all-at-once encoding, creating a 
selectively secure scheme reminiscent of Boneh and Boyen (S]. Though it does not 
natively give a secure signature (a costly generic conversion would be needed), 
we mention it because our framework turns it into a fully secure IBE and more. 


Bandwidth Requirements. The following table compares the space efficiency of 
those recent signature schemes (in hard lattices, unless indicated otherwise). 



SIS 
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Parameter A is the security level; t the message bit-size; q the modulus; and m and n 
the lattice and constraint dimensions where A ~ n < m — 0(n logg). Tabulated SIS 
strength is the approximation factor /3 incurred by the security reduction; and | entity\ 
the # of entries in Z and/or Z 9 per entity, with up to [log</| [log/3] bits per entry. 

We remark that moderate differences of approximation parameter (/3) have 
limited practical impact compared to variations in the number () of entries. 
Indeed, (3 is linked to the modulus of Z g and the norm of entries in Z; and varying 
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their magnitudes by a factor 2 = £ kl n k2 = poly(g) only affects the information- 
theoretic bit sizes by a factor 1 + log z/ log q = 0(1). By contrast, if we vary the 
number of entries by a factor z, the total bit sizes vary by a factor 0(z). Note that 
V/3 = poly(n), there is an average-case /3-SIS reduction from worst-case SIVP 
with approximation factors 7 = 0{P^/n), widely believed hard V 7 = poly(n). 
The concrete parametric hardness of these assumptions is estimated in |12I20| . 

1.2 Contribution 

In this work, we propose a lattice-based encoding framework that generalizes the 
all-at-once encoding of Agrawal et al. [T|. The relationship of this work to the 
other one is akin to that linking Waters |24| to Boneh and Boyen jS] in pairing 
groups. Our goal is to build compact, practical, and “fully secure” signatures and 
identity-based encryption, from hard integer lattices in the standard model. 

Here we focus on signatures. Our main construction is a stateless “hash-and- 
sign” fully secure signature, i.e., existentially unforgeable under chosen-message 
attacks, that is about as short as |T3]. Our main result is a standard-model 
security reduction for it and related schemes (from the classic average-case SIS 
problem, itself reducible from worst-case SIVP and other hard problems |5I23| 1. 

As a bonus, our framework yields a clean “unsalted” construction that extends 
effortlessly from signature to identity-based private-key extraction. The two are 
indeed closely related, except that certain tricks used to make signatures secure 
are incompatible with IBE, such as black-box randomized hashes whose “nonces” 
would be inaccessible to a non-interactive encrypting party. Our framework does 
not have this problem, and has already been used to make the IBE scheme of jT] 
fully secure with little loss of efficiency (see the full version of |T] for details). 

1.3 Highlights 

Technically, we obtain our compact signature by “mixing” together, in a message- 
dependent manner, a number of public-key matrices in order to induce in a 
deterministic way a large family of hard lattices. A signature is a short non-zero 
vector in the appropriate lattice. For proving adaptive security, we arrange the 
lattice melange in such a way that a signing trapdoor, i.e., a short lattice basis, 
is always available for every possible input in the real scheme. In the simulation, 
faulty trapdoors will be made to vanish through spurious cancellations for a 
certain, suitably sized set of “challengeable inputs”, unknown to the adversary. 

A crucial and novel feature of our framework is to ensure that the challenge- 
able inputs are well spread out over the entire input space, regardless of the 
selected size of the challengeable set. This ensures that, regardless of the actions 
of the adversary, the simulation will unfold with a significant and more or less 
invariant probability of success. This simulation robustness property is unusual 
and key to achieving an efficient security reduction. 

Earlier schemes, also based on this principle of small but non-negligible chal¬ 
lengeable input sets, generally did not have the luxury of uniform distributions 
over custom domains; they had to provision complex mechanisms to compensate 




502 X. Boyen 


for the non-uniformity of certain events in function of the adversary’s actions. 
The Waters jH] scheme, for example, contains such a mechanism, prompted by 
the non-existence of distributions of non-negligible equal weights over exponen¬ 
tially sized groups as used in pairing-based cryptography. 

With lattices, by contrast, the possibility to work with smaller moduli gives 
us an extra handle on the construction of “nice” distributions for a very wide 
range of challengeable input set sizes. As a result, we obtain security reductions 
that are simpler, tighter, and more efficient. 

2 Lattice Notions 

Here we gather a number of useful notions and results from the literature. 

We denote by |jA|| or ||a|| the ^ 2 -norm of a matrix A or vector a. We denote 
by A the Gram-Sclnnidt ordered orthogonalization of A, and its t^-norm by ||A||. 

2.1 Random Integer Lattices 

Definition 1. Let a basis B = [ bi | ... | b m ] £ R mxm be an m x m matrix 
with linearly independent columns bi,..., b„, £ R m . The lattice A generated by 
the basis B and its dual A* are defined as (both are m-dimensional), 



1 



Definition 2. For a positive integer q (later a prime) and a matrix A £ Z” xm , 
define two m-dimensional full-rank integer lattices: 



These are dual when properly scaled, as A -1 -(A) = qA( A)* and A(A) = q A ± (A)*. 

2.2 Bases and Trapdoors 

A fundamental result in the geometry of numbers is that every lattice A has a 
basis; e.g., see [15!. Implicit in its proof, is the well known fact that any full-rank 
set S ^4 C A can be converted into a basis T^ for A with no greater orthogonalized 
norm ||f. 4 || < ||Sm||. 

Fact 3. For a set X = { Xi ,..., x m } of lattice vectors, let X = {x\, ..., x m } be its 
Gram-Schmidt ordered orthogonalization. There is a deterministic polynomial¬ 
time algorithm that, on input an arbitrary basis of an m-dimensional lattice 
A and a full-rank set S = {si,..., s m } C A of lattice vectors, returns a basis 
T = {ti,..., t m } of A such that || tj|| < ||sj|| for all i = 1,..., m. 
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Ajtai [6] shows how to sample a uniform matrix A £ Z^ xm with an associ¬ 
ated full-rank set Sa C A- 1 (A) of low-norm vectors orthogonal to A modulo q. 
Tightness was later improved by Alwen and Peikert [7]. 

Proposition 4 ([?]). For any <5o > 0, there is a probabilistic polynomial-time 
algorithm that, on input a security parameter 1 A , an odd prime q = poly(A), 
and two integers n = 0(A) and m > (5 + 3 <$o) n log q, outputs a statistically 
(mq- s ° n / 2 )- close to uniform matrix A £ Z™ xrn and a basis T4 C A ± (A) such 
that with overwhelming probability HTaII < O(nlogg) and |||| < 0(rjn log q). 

For the purpose of this paper, we take 4o = 1/3, assume L = f}(y/rn), and 
summarize the foregoing as follows. 

Fact 5. There is a probabilistic polynomial-time algorithm that, on input a security 
parameter l\ an odd prime q = poly(A), and two integers n = 0(A) and m > 
6 n logg, outputs a matrix A £ Z^ xm statistically close to uniform, and a basis 
Ta for A _L (A) with overwhelming probability such that ||T ,4 || < 0(y/m) < L. 


2.3 Discrete Gaussians 

Given a basis for an integer random lattice, we recall how to sample random 
lattice points from a discrete Gaussian distribution whose minimum “width” is 
function of the norm of the lattice basis. We follow the works of |23I4I111113) . 

Definition 6. Let m £ Z>o be a positive integer and A C R m an m-dimensional 
lattice. For any vector c £ ]R m and any positive parameter a £ R>o, we define: 

Pa, c(x) = exp 7r ^ x ~°^ a Gaussian-shaped function on R m with center c 
and parameter a, (For x £ R, Per, c {x) oc h f the normal probability 

density of variance ^ and mean 0.) 

P<j,c(A) = Sxeh^. c ( x): the (always converging) discrete integral of p a c over 
the lattice A, 

TAa,o.c '■ the discrete Gaussian distribution over A with center c and parameter 

( 7 , 

\/y £ A , £> a ,< 7 , c ( i /) = P<j,c(y)/p*, c (A) 

For notational convenience, origin-centered p at o and T>a,ct ,o are abbreviated as 
p a and T>a, a- 

Gentry et al. [T3] show that, given a basis B for a lattice A, one can efficiently 
sample points in A with discrete Gaussian distribution for sufficiently large values 
of a. 

Proposition 7 (|I3]). There exists a probabilistic polynomial-time algorithm 
that, on input an arbitrary basis B of an m-dimensional full-rank lattice A = 
£(B), a parameter a > ||B|| w(Vlog to), and a center c £ R m , outputs a sample 
from a distribution that is statistically close to T>a,g,c- 

For concreteness, we will refer to the algorithm of Proposition [7] as follows: 
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SampleGaussianl B. a, c): On input a basis B for a lattice A C R m , a positive 
real parameter a > ||B|| u(\/\ogrn), and a center vector c £ R m , it outputs 
a fresh random lattice vector x € A drawn from a distribution statistically 
close to T>a, c r,c- 

2.4 Smoothing Parameter 

We recall the notion of smoothing parameter of a lattice which lower-bounds 
the “density” of points on a lattice across all directions, and how this relates to 
discrete Gaussian sampling on the lattice. 

Micciancio and Regev [19| define the smoothing parameter of a lattice as 
follows. 

Definition 8 (|19|). For any m-dimensional lattice A and any positive real 
e > 0, the smoothing parameter r/ e (A ) is the smallest real rj > 0 such that 
Pi/v( A * \ {°}) ^ e - 

Micciancio and Regev |19| show that large deviations from lattice points vanish 
exponentially. 

Proposition 9 ([IT)]). For any lattice A of integer dimension m, any point c, 
and any two reals e £ (0,1) and 77 > y e (A), 

Pr | x ~ T>A,r),c ■ ||x - c|| > ^rnr] j < j^ 2 ~ m 

Peikert and Rosen [^2] show that the Gaussian function itself vanishes away from 
any point. 

Proposition 10 (|22|). For any lattice A of integer dimension m, any center 
c £ R m , any two reals e £ ( 0 , 1 ) and 77 > 2 ? 7 e (yl), and any lattice point x £ 
span (R), 

'Da, v , c {x) < Y^ 2 ~ m 


2.5 Statistical Mixing 

We recall some useful statistical mixing properties relating to the reduction of 
an integer vector modulo a lattice to yield a syndrome. 

Ajtai [5] then Regev [23] show that binary combinations of enough vectors 
alsmost always span the space. 

Proposition 11 ([23])- Let m > 2nlogq. Then for all except at most some q~ n 
fraction of matrices A £ Z^ xm , the subset sums of the columns of A generate 
Z”. In other words, for every syndrome u £ Z n there exists a binary vector 
e £ {0, l} m such that Ae = u (mod q). 

Gentry et al. [T3] show that short Gaussian combinations of any spanning vector 
set yields uniformity. 
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Proposition 12 ([13]). Assume the columns of A 6 Z” xm generate Z”, and 
let e £ (0,1) and rj > r) e (A ± (A)). Then for e ~ m v the distribution of the 
syndrome u = Ae mod q is within statistical distance 2 e of uniform over Z™. 

Furthermore, fix u £ Z™ and let c £ Z m be an arbitrary solution to A c = u 
(mod q). Then the conditional distribution of e ~ given Ae = u (mod q ) 

is exactly c + T>j i-l(a),t ; ,_ c - 

Gentry et al. [13] then show that for random A the lattice A(A) has large minimal 
distance in and thus that A- 1 (A) has small smoothing parameter. 

Proposition 13 ([13]). Let q be a prime and n and m be two integers satisfying 
m > 2 n log q. Then, for all but at most some q~ n fraction of matrices A £ 
Z^ xm , it holds that AJ°(A(A)) > q/ 4. Also, for any such A and any u;(-\/k>g m) 
function, there is a negligible function e(m) such that the smoothing parameter 
?7e(A J -(A)) < w(Vlog to). 

Combining the previous propositions, Gentry et al. 115] summarize the results 
as follows. 

Fact 14. Fix a prime q and two integers n and in satisfying m>2n log q. For 
all but at most 2 q~ n of matrices A £ Z™ XTra and for any Gaussian parameter iq > 
w(vlogm), on input e ~ ^ the distribution of the syndrome u = A e mod q 

is statistically close to uniform over Z n . 


2.6 Preimage Sampling 

We recall the notion of preimage-samplable functions (PSF) defined in |13| . 
which is based on the combination of a trapdoor construction for integer lattices 
and an efficient discrete Gaussian sampling algorithm. 

Let a uniform matrix A £ Z™ xrra and a low-norm basis T ^ for the lattice 
A X (A). Used in the discrete Gaussian sampling algorithm, the short basis T a 
can act as a trapdoor for finding small non-zero solutions e £ Z m of the equation 
A T e = 0 (mod q) or more generally A r e = u (mod q) for any u £ Z’f. This 
leads to the notion of preimage-samplable functions [13j . 

We give the following definition of preimage-samplable function, following [13] : 

Definition 15. Let A, q , n, to, and L be as in Fact [5] Let a > Lu>(y/ log to) be 
some Gaussian parameter. A preimage-samplable function family is a collection 
of maps f a : Bz m a —+ Z” from Bz m ,<r = {e £ Z m : ||e|| < y/maj C Z m into Z™, 
and specified by the following four algorithms: 

TrapGen(l A ): Ou input 1 A , it uses the algorithm of Fact [5] to obtain a pair 
(A,T 4 ), where A £ Z 1 f xm is statistically close to uniform and J A C A ± (A) 
is a short basis with jjT|| < L. The public function parameters are (A, q). 
The preimage-sampling trapdoor is the basis T a- 
EvalFun(A, q, e) : On input function parameters (A ,q) and an input point e £ 
Dz m. a , it outputs the image /a(©) = Ae mod q in Z™. (The output is unde¬ 
fined on large input e £ Z m \ Bz m ,er-) 
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SampleDom(l( m ), a): On input the to x to identity matrix and a Gaussian 
parameter a, it outputs e <— SampleGaussian(l( m ), < 7 , 0), i.e., outputs an 
element e G Z m such that e ~ The input matrix l( m ' > conveys the 

dimension to and its columns give a basis for Gaussian sampling in the lattice 
Z m . By Proposition HU1 with overwhelming probability e G Dzm CT . 
SamplePre(A, q , Ta,ct, u): On input function parameters A and q and a trap¬ 
door T_ 4 , a Gaussian parameter er as above, and a target image u G TTf, 
it samples a preimage e G D^m a from the distribution 'Dim a conditioned 
on the event that Ae = u (mod q). To do this, it solves for an arbitrary 
solution c G Z m in the linear system Ac = u (mod g); it then samples 
d 4 — SampleGaussian(T J 4 , <r, —c) ~ L> A ±^ a _ c and outputs e = c + d in Z m . 
By Proposition [TU] with overwhelming probability e £ D z m ,cr- 

The construction is correct and efficient by Proposition [T2} see [13] for details. 

2.7 Elementary Delegation 

There are several ways to delegate a short basis for .d J -(A) into one for d x ([A|B]). 
If there is no one-wayness requirement on the delegation process, then Peik- 
ert [ 21 ] describes a very effective elementary deterministic way to do this. 

Proposition 16 (|[2T|). Take any matrix A G Z^ xmi such that the columns 
of A span the group Z^. Let an arbitrary B G Z” xm2 , and define F = [A| B]. 
There exists a polynomial-time deterministic algorithm that, given A, B, and an 
arbitrary basis T a for yl-^A), outputs a basis T p for d J -(F) while preserving the 
Gram-Schmidt norm of the basis (i.e., such that ||Tj?|| = ||T^||). 

2.8 Hardness Assumption 

The following lattice problem was first suggested to be hard on average by Aj- 
tai [5] and formally defined by Micciancio and Regev [T9] . 

Definition 17. The Small Integer Solution (SIS) problem in 1 , 2 -norm is: given 
an integer q, a matrix A € Z™ xm , and a real (3, find a non-zero integer vector 
e G Z m such that Ae = 0 (mod q) and ||e|| 2 < /?• The average-case (q,n,m,(3)- 
SIS problem is defined similarly, where A is uniformly random. 

This problem was shown to be as hard as certain worst-case lattice problems, 
first by Ajtai [5], then by Micciancio and Regev [TS], and Gentry et al. [T3] , 

Proposition 18 ([13]). For any poly-bounded to, any (3 = poly(n) and for any 
prime q > f3-u>{\/n log n), the average-case (g, n, to, /?)-SIS problems is as hard as 
approximating the Shortest Independent Vector Problem (SIVP), among others, 
in the worst case to within certain 7 = (3 ■ 0(y/n) factors. 

2.9 More Useful Facts 

Lemma 19. Let Bq G Z^ xm . Let H be a scalar h G Z g or a matrix H £ Z™ xn . 
Suppose that H is invertible modulo q (i.e., \H\ ^ 0 (mod q) when q is prime). 
Then, the two preimage-samplable functions (Bo)(-) mod q and (H Bo)(-) mod q 
from Z m into Z™ admit exactly the same trapdoors Ts 0 C Z m . 
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Proof. For all e £ Z m we have Bo e = 0 (mod q) if and only if H Bo e = 0 
(mod q), hence the two lattices A -1 -(Bo) and Bo) are the same. Thus, Pb 0 C 

A ± (B 0 )^T Bo cA ± (HB 0 ). □ 

3 General Simulation Framework 

We now describe the core scheme. At a high level, we achieve short signatures 
with full adaptive security by providing a relatively large number of public- 
key matrices, which are then “mixed through” together in a message-dependent 
manner — as opposed to merely juxtaposed as in the constructions of mm- 
In the simulation, the public-key matrices will hide a trapdoor component that 
has a non-negligible probability of vanishing in the mix for certain unpredictable 
choices of messages: on those messages the simulator will be unable to answer 
signature queries, but will be able instead to exploit an existential forgery. 

Our key-mixing technique is at some level reminiscent of Waters’ scheme [2jT 
in bilinear groups, but with a number of crucial differences. The farther-reaching 
difference is that in the lattice setting we can exploit the smaller groups and 
their richer structure to create a (much) more efficient “mixing” effect than in 
the large cyclic groups of the discrete-log setting. Another difference concerns 
randomization, which in a lattice setting tends to be rather more involved than 
in discrete-log settings; our approach is based on the method of randomization 
by a low-norm matrix from [T| , with the small added contribution to show that 
it can be done in a way that supports the mixing effect that we need. 

3.1 Two-Sided Trapdoors 

To facilitate the description of the scheme and its proof, we first construct a 
preimage-samplable function of a special form that will be able to sample short 
preimages from the same distribution, using either one of two types of trapdoors: 
“firm” trapdoors will be used in the real scheme, and will never fail to work; 
“fickle” trapdoors will be used in the simulation, and will be fragile by design. 

Lattices with dual trapdoors were first introduced in |91Tj . Here, we seek to 
let the matrix f?, below, be generated as a mixture of certain low-norm matrices. 
All the algorithms in this subsection are adapted from § 4 of jT| . 

Definition 20. Consider an algorithm TwoSideGen(l A ) that outputs two ran¬ 
dom matrices A £ Z™ xm and R £ Z mxm , where A is uniform and R has some 
distribution 1Z. Let B £ Z™ xm be an independent third matrix. Write A R as 
shorthand for (A R mod q) £ Z” xm , and define, 

F = [A | AR+ B] £ Z™ x2m 

We say that the pair (F, q) defines the public parameters of a two-sided function. 

The following lemmas show that a two-sided function (F, q) is a preimage- 
samplable function given a trapdoor for either A or B, provided that A and 
R are drawn from suitable distributions. 
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Lemma 21. For any parameter r) > u>(y/ log m), there exists an efficiently 
samplable distribution 1Z V over Z mxm , such that with overwhelming probabil¬ 
ity R = YLi for independent Rj ~ TZ v has norm ||R|| < yfmrj, and such that 
for (A, R) ~ U z nxm x 1Z and fixed B £ Z() xm th e matrix F = [A | A R+ B] £ Z™ x2rra 
is statistically close to uniform. 

Proof. According to Fact HU it suffices to pick the columns of R independently 
wiht ~ Vzrn^. □ 

Lemma 22 (“Firm” trapdoor). Let L and a be as in Definition \15\ and 7 Z^ as in 
Lemma [£l\ //[A|B] ~ W z »x2m and C A- 1 (A) of norm ||Ta|| < L, then the pair 
(F = [AjB], q ) is a preimage-samplable function in the sense of Definition\15l 

Proof. Per Lemma 1211 F is statistically close to uniform in Z” x2m , thus F has 
the right distribution. It remains to show how to perform public and trapdoor 
sampling. 

SampleDom. To sample short vectors e ~ T> Z 2 m CT in the domain of F, one 
proceeds exactly as in the GPV scheme, i.e., by executing SampleDom(l ( - 2m \ a) 
which does not require any trapdoor. 

SamplePre. For preimage sampling, we show how to sample a short preimage 
e £ Z 2m of any u £ Z” with conditional distribution H Z 2m jCr | Fe = u (mod q). 
Since a random A £ Z” xm will almost always span all of Z™, we can use the 
deterministic delegation mechanism of Proposition [11] to obtain a basis T p for 
F with short Gram-Schmidt norm ||Tj?|| < L. Having such a trapdoor T p for F, 
we invoke SamplePre(F, g, T p , cr, u) to obtain a short random preimage e. □ 

Lemma 23 (“Fickle” trapdoor). Let L be as in Definition ] 15A 77 as in Lemma WTf 
and a = L' io(y/ log m ) where L’ = 2 77 cr' sjm and where o' > L y/£m w(-\/log to). 
Fix a matrix B £ Z” xm with a short basis T b of ort.hogonalized norm ||Ts|| < L. 
For (A, R) such that [A|A R] ~ W z «x 2 m and || R|| < q \firn, the pair (F = [A|A R + 
B], q) is a preimage-samplable function in the sense of Definition\15\ 

In this lemma, we allow || R|| < 77 vTm, where the factor \J~i will account for the 
fact that in the simulation the matrix R = R msg = X^:=i f° r independent R; 
of norm ||R;|| < 77 yjrn and coefficients ±1 function of the message msg. 

Proof. Per Lemma [711 F is statistically close to uniform in Z” x2m , thus F has 
the right distribution. We need to show how to perform public and trapdoor 
sampling. 

SampleDom. Sampling short vectors e ~ X> Z 2 m a is done without any trapdoor 
by invoking SampleDom(l^ 2m \ a), as in the previous lemma. 

SamplePre. For preimage sampling, we need to show, given any input u £ Z^, 
how to sample a short preimage e £ Z 2m of u with conditional distribution 
V Z 2 m a | Fe = u (mod q). We do this in three steps: 

1. We build a full-rank set S_f C A- l (F) such that ||Sf|| < 2 77 L yfi rowfVlog m). 
This is done by independently sampling short vectors e,; £ A ± (F) until a linearly 
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independent set of 2 m such vectors is found. To sample one short vector e £ 
yT^F) given the trapdoor T g, we compute di <— SampleDom(l^ m \ (rjy/I—1) a') 
and d 2 <— SamplePre(B, q , T^, cr', —A di), and define, 


d = 


dr 

d 2 



£ Z 2m 


Observe that e is a fixed invertible linear function of d, and that d is discrete 
Gaussian by construction. A result of Regev [23] shows that, with overwhelming 
probability, at most 4 m 2 samples will be needed to get 2 rn linearly independent 
vectors “d”, and therefore also 2 in linearly independent vectors “e”. For each e, 
we have Fe = A (di — R d 2 ) + (A R+ B) d 2 = A di + B d 2 = A di — A di = 0 £ Z”, 
hence e £ zl-^F). We have also ||e|| < (77 \fl — 1) o' ^Jm + r\ a'm + a' ypm < 
2ri a' m\ft. Thus by assembling 2 m linearly independent such vectors “e”, we 
obtain a full-rank set S F C A J_ (F) of orthogonalized norm ||S F || < 2r/a' mVi. 

2. We convert the short set S p into an equally short basis T F , i.e., such that 
||T f || < ||S F ||. We can do this efficiently using the algorithm of Fact [3] starting 
from an arbitrary basis for yT^F), itself easy to construct by linear algebra. 

3. We use the newly constructed basis T F to sample a short preimage e of 

the given target u £ Z^, using e SamplePre(F, q, T F , cr, u). Notice that the 
Gaussian parameter a > ||T F || o;(-\/logm), so the algorithm SamplePre can be 
applied with the stated parameters, and hence e sampled in this manner will 
have conditional distribution e ~ T>i 2 m a | Fe = u (mod q). □ 

Remark 24. Agrawal et al. [2] show the sampling overhead is only a factor < 2, 
hence in Step 1 we need to sample at most 4 m vectors “e” on expectation. 

We also mention that a lower-norm fickle trapdoor may be obtained by using 
the Alwen-Peikert delegation method as in Lemma [55] instead of the repeated 
sampling as above. We shall present it in the full version. 

The point of the two-sided preimage-samplable function is that in the actual 
scheme we use the “firm” preimage mechanism with an always-available trapdoor 
T a, whereas in the simulation we use the “fickle” preimage mechanism T g for a 
matrix B = h msg Bq that sometimes vanishes. 


3.2 Main Signature Scheme 

The following is our core construction of a fully secure short signature. It is very 
simple and already achieves most of the compactness benefits while illustrating 
the framework. In the full version, we show how to squeeze out some additional 
factor from the signature size, albeit at the cost of a more complex system. 

From now on, a message msg is an Gbit string (msg[l],..., msg[L]) £ {0,1 Y 
indexed from 1 to £, augmented with a 0 -th dummy extra bit set to msg[0] = 0 . 
This will let us easily include a constant term of index 0 in various summations. 

KeyGen(l A ) : On input a security parameter A in unary, do these steps: 

1. Draw an n-by-m matrix Aq £ Z” xm with a short basis Ta 0 C A _l (Ao). 
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- Do so by invoking TrapGen(l A ), resulting in T a 0 such that ||T>i 0 || < L. 

2. Draw i + 1 independent n-by-m-matrices Co,..., Q £ Z^ xm . 

3. Output the signing and verification keys, 

SK = (T Ao ) £ z mxm VK = (Ao, C 0 ,, Q) £ (ZJ xm f + 2 

Sign(SK, msg): On input a signing key SK and a message msg £ {0} x {0,1}': 

1. Define the n-by-m-matrix C msg = (—l) msg ^ Cj. 

2. Define the message-dependent matrix F msg = [A 0 | C msg ] £ Z^ x2m . 

3. Sample a short non-zero random point d £ ,/T L (F msg ), using SK = T_a 0 . 

- Do so by sampling d ~ Vj 2 m „ | F msg d = 0, using Lemma [T21 

4. Output the digital signature, 

sig msg = (d) £ Z 2m 

Verify(VK, msg, sig msg ): On input a verification key VK, a message msg, and a 
signature sig msg : 

1. Check that the message msg is well formed in {0} x {0,1} ( . 

2. Check that the signature sig msg is a small but non-zero vector. 

- Do so by verifying that sig msg = d £ Z 2m and 0 < ||d|| < V / 2m • a. 

3. Check that sig msg is a point on the “mixed” lattice specified by msg. 

- Do so by verifying that 

i 

[ A o | (-l) mssW Q] d = 0 (mod q) 

i =0 

4. If all the verifications pass, accept the signature; otherwise, reject. 


3.3 Security Reduction 

It is easy to see by inspection that the signature scheme is consistent with over¬ 
whelming probability. 

The next theorem reduces the SIS problem to the existential forgery of our 
signature. The proof involves a moderate polynomial SIS parameter /?. The ex¬ 
pression of [3 arises in Lemma I2fil but otherwise “passes through” the reduction. 
In S I3.4I we revisit the question of the lattice parameters in greater detail. 

Theorem 25. For a prime modulus q = q{ A), if there is a probabilistic algorithm 
A that outputs an existential signature forgery, with probability e, in time r, and 
making Q < q/2 adaptive chosen-message queries, then there is a probabilistic 
algorithm B that solves the (q,n,m, (3)-SIS problem in time t' sa r and with 
probability e' > e/(3 < 7 ) , for some polynomial function (3 = poly(A). 

Proof. Suppose that there exists such a forger A. We construct a solver B that 
simulates an attack environment and uses the forgery to create its solution. The 
various operations performed by B are the following. 
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Invocation. B is invoked on a random instance of the ( q , n, m, /3)-SIS problem, 
and is asked to return an admissible solution. 

— Supplied: an n-by-m-matrix Ao € Z™ xm from the uniform distribution. 

— Requested: any eo £ Z m such that Ao eo = 0 (mod q) and 0 ||eo|| < (3. 

Setup. B gives to the adversary A a simulated verification key constructed as 
follows: 

1. Pick a random matrix Bo £ Z” xm with a short basis Tb 0 C /^(Bo). 

- Do so by invoking TrapGen(l A ), resulting in T b 0 such that ||T b 0 || < L. 

2. Pick t + 1 short random square m-by-m-matrices Ro,..., Rf £ Z mxm . 

- Do so by independently sampling the columns of the R; ~ 2?zm j77 . 

3. Pick l uniformly random scalars hi,..., £ Z g and fix ho = 1 £ Z 9 . 

4. Output the verification key VK = ( A 0 , Co = (A 0 Ro + h 0 B 0 ) mod q, 
Ci = (A 0 Ri + hi B 0 ) mod q, ..., Q = (A 0 R f + b c B 0 ) mod q ). 

Queries. B answers adaptive signature queries from A on any message msg as 
follows: 

1. Compute the matrix R msg = (—l) msg M R,. 

2. Compute the scalar h msg = (—l) msg ^ hj. 

3. If h msg = 0 (mod q), abort the simulation. 

4. Compute the matrix F msg = [A 0 | A () R msg + hmsg Bo] £ Z^ x2m . 

5. Find a short random d £ ^(F^g) C Z 2m , using the trapdoor T b 0 - 

- Do so by sampling d ~ T>y?m. a given F msg d = 0, using the procedure of 
Lemma l23l using Tb 0 as short basis for vl^hmsg Bo) per Lemma ITTT1 

6. Output the digital signature sig msg = d £ Z 2m . 

Forgery. B receives from A a forged signature d* on a new (unqueried) message 
msg*, and does: 

1. Compute the matrix R* = ZLo (-l) msg * [il Rj. 

2. Compute the scalar h* = Z!i-a (—l) msg hi 

3. If h* ^ 0 (mod q), abort the simulation. 

4. Separate d* into [d* | d^ ]. 

5. Return eo = d]) + R* d^ £ Z m as solution to Ao eo = 0 (mod q). 

Lemma |2H1 shows that the answer eo will be with small and non-zero with 
good probability, and thus a valid (q, n, m, /3)-SIS solution for the stated 
approximation (3. (An instantiation of /3 is given in § 13.40 

Outcome. The reduction is valid provided that B can complete the simulation 
(without aborting) with a substantial probability that is independent of the 
view of A and the choices it makes. The completion probability for B against 
an arbitrary strategy for A is quantified in Lemma 1271 

It follows from the bounds of Lemmas [5H] and [57] under the assumption that 

Q < q/2, that if A existentially forges a signature with probability e, then B 

solves the SIS instance with probability, 

e 7 > 7T 0 (l — q^ 1 Q) q^ 1 e > n 0 e/2q > e/3 q for n 0 > 2/3 

With the stated lemmas, this concludes the security reduction. □ 
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Lemma 26. Given a valid forgery \_<l( r | g^ T ] from A on some msg* such that 
hmsg* = 0 (mod q), the vector eo = d\ + R m sg* d* 2 € Z m is with high probability 
7 T 0 = 0(1) > 2/3 a short non-zero preimage of 0 under Ao, namely, eo G A^Aq) 
and 0 y! 11eo11 < (3 for some polynomial function (3 = poly (£,n,m) = poly(A). 

Sketch. Let h* = h msg » and R* = R msg .. Let C* = C msg . = J2i= o (~l) msg 1*1 Q. 
First, when h* = 0, we have C* = Aq R* + h* Bo = Aq R*, and thus for a valid 
signature forgery d*, 

A 0 e 0 = A 0 (d^ + R* d^) = [A 0 |A 0 R*] l = [A 0 | C*] d* = 0 (mod q) 

l a 2j 

Next, we show that eo is suitably short, which is true since R* is a sum of 
£ + 1 low-norm matrices Rj with coefficients ± 1 , where the summands are all 
short discrete Gaussian by construction of Rq, ..., Rf. Since the matrices ±Rj 
are nearly independent with the same variance V{±Rj} = V{Ri}, we have, 

it i 

V{ R *} = V{5>R.,} « E y { ±R *} = E V I R <} = (^ + 1) • V{Rr} 

2—0 2=0 2=0 

Since the ±Rj closely approximate real normal Gaussian variables, so does R* 
and therefore the Gaussian “vanishing tail” inequalities apply. Especially, as they 
are almost independent discrete Gaussian with center 0 and parameter 77 , and 
thus E{R*} ss E{R,} = 0, we have Pr{|| ± R,|| > ^/mr|} = negl(m); and thus , 1 

Pr{||R*|| > y/£ + 1 • < Pr{||Ri|| > y/mr]} = negl(?n) 

Hence with overwhelming probability ||eo|| < f3 for (3 = poly(£, n, m) = poly(A), 
provided we set, 1 



Finally, it remains to show that eo = + R msg * dj 7 ^ 0. Suppose for an easy 

case that dj) = 0; then for a valid forgery we must have di 7 ^ 0 and thus eo 7 ^ 0. 
Suppose on the contrary that dj =£ 0. In that case, 0 7 ^ |||| < \/2ma <C q ; 
and thus there must be at least one coordinate of d^ that is non-zero modulo q. 
W.l.o.g., let this coordinate be the last one in d^, and call it y. Let r* be the last 
column of R*, and let iq the last column of Rj for each i. As R* = (— l) msg M R () 
we have r* = ^ (—l) msg [*lrj, where the coefficients ±1 depend on the message 
bits. We focus on iq: the last column of the matrix Ri associated with the first 
message bit msg[l]. Let v = (—l) ms s[ 1 I y ri . The expression of eo can be rewritten 
eo = y r* + e(, = v + e' 0 ', where v depends on ri and eg does not. 

The last step is to observe that the only information about iq available to A 
is contained in the last column of Ci (with “pollution” hi Bo, known in the worst 
case). By leftover hash or a simple pigeonhole principle, there are a very large 

1 Without using any independence, we can show Pr{||R*|| > (£+1)- ,/rn.g} = negl(m), 
and accordingly set (3 = (l + (l + 1) ^/my) \/2ma, which is a factor « \/I worse. 
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(exponential in m — n\ogq) number of admissible and equally likely vectors iq 
that are compatible with the view of A, and in particular more than six of them. 
Since A can set the bit msg[l] in one of two ways, it follows that A cannot know 
the value of v with probability exceeding one third. At most one such value can 
result in a cancellation of eo, for if some v caused all coordinates of eo to cancel, 
then every other v would fail to do so. We deduce that ttq = Prjeo ^ 0} > 2/3. 
(In fact, we have tt 0 > 1 — exp(— fi(m — nlogg)) —» 1 as A —» oo.) □ 

Lemma 27. For a prime modulus q = q{ A) and a number of queries Q > 0, 
the simulation completes both the Queries and Forgery phases without aborting, 
with probability, 


— ^1 — —^ < Pr{ completion} < — 

In particular, for Q < q/2, this probability is Pr {completion} £ [g _1 /2, g -1 ] 
regardless of the adversary’s strategy. 

Intuitively, we first observe that provided B does not abort, then the simulation 
is (almost) perfect in the sense that the view of A has the same distribution as 
in an attack against the real scheme (modulo a negligible sampling error owing 
to the imperfection of TrapGen). In particular, A’s view remains independent of 
£>’s choice of hi,..., h^, simply because those values have no counterpart in an 
actual attack environment. 

Now, the adversary can always assume that it is facing a simulator instead 
of a real challenger, and accordingly attempt to derail the simulation. Since the 
necessity to abort, for a given adversarial strategy, hinges entirely on B’s secret 
choice of random hi,..., h(, it suffices to show that these values remain mostly 
unlearnable no matter A’s attack strategy. 

To show this, we consider a hypothetical unbounded perfect adversary A and 
show that, even with perfect Bayesian updating upon each new adaptive query 
it makes, such adversary is unable to infer enough information about hi,..., h^ 
to affect significantly the success probability of the simulation. 

Proof. Consider the ^-dimensional space lf q , which is the domain of the unknown 
(hi,..., h^), and recall that ho = 1. Denote by Hj the distribution of(hi,...,h^) 
over lf q as perceived by the adversary after the first j signature queries have 
been answered without aborting. 

At the start of the attack, since the simulator’s selection of (hi,..., fq) is a 
uniformly random point in if, the adversary’s prior distribution Ho is necessarily 
the uniform distribution U(1 q ) over if. For every query message rnsg^ that is 
answered without aborting, A can prune from the support of H every point 
(hi,.... h<?) that lies on the “incompatible” hyperplane h msg . = 0 (mod q). 

Denote by Yj the hyperplane thus eliminated after a successful j-th query. 
Suppose by induction that Hj -1 = t/(W), a uniform distribution over some 
support set W C if By conditioning Hj -1 on the new evidence gained at the 
j’-th query, namely that (hi,..., h<>) fL V f, one obtains an updated or posterior 
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distribution TLj = U(W\Vj), which is uniform over the smaller support set given 
by W \ Vj. By induction on the number of queries, starting from TLq = U(Z q ), 
we deduce that, after the j-th query, TLj = U(Z q \ u| =1 Vj). 

In particular, after all Q allowed queries have been made, the fully updated 
posterior distribution TLq in the view of the adversary is then, 

Hq = U(Z‘\ u? =1 Vi) 

In other words, this shows that, in the event that B was able to answer all the 
queries, the unknown vector (hi,.... tp) remains equally likely to lie anywhere 
in all of Z q outside of the Q query-dependent hyperplanes Vi,..., Vq. Being the 
result of perfect Bayesian updating from all available observations, this distri¬ 
bution captures all the information about (hi,..., h^) leaked by B to A during 
the Queries phase. 

To complete the argument, consider the hyperplane V* C Z q defined by the 
scalar equation h msg . = 0 (mod q) corresponding to the forgery message msg* 
chosen by the adversary. By the requirements of what constitutes a valid exis¬ 
tential forgery, we know that msg* ^ msg^ and thus V* ^ Vj for all j. (Indeed, 
the purpose of adding a fixed dummy message bit msg[0] and setting ho = 1 ^ 0 
is to ensure that any two distinct messages msg^ ^ msg* £ {0,1}* always induce 
distinct hyperplaces Vj / ?’ C Zj.) 

Since V* and Vj are distinct affine subspaces of dimension t — 1 in Z q , we 
have |V* n Vj\ < q e ~ 2 whereas |V*| = |Vj| = g £_1 and of course \Z q \ = q e . 
Consequently, V* and Vj have at most a fraction 1/q of their points in common, 
and more specifically |V* \ Vj | > (1 — g _1 ) |Y* | = (1 — g -1 ) q~ l jZ q j for all j. 

Considering the event completion = A^^hi,..., h g) £ (Y* \ Vj)} and in¬ 
voking the union bound on this conjunction, we thus establish a lower bound, 

Pr{ completion} = Pr{(hi,..., h^) £ (V* \ U^Vj)} 

> (l - g -1 Q) Pr{(h 1; ..., hf) £ V*} = (l - g -1 Q) q^ 1 
Conversely, we can trivially establish an upper bound, 

Pr{ completion} = Pr{(hi,..., h^) £ (V* \ U^jVj)} 

< Pr{(h 1 ,...,h^)£V*} = |V*| / \L l q \ =q- x 

In both cases the probability is over the simulator’s initial choice of hi,..., h^. 

We have shown that the probability of completion without aborting is bounded 
in the narrow range [(1 — q~ l Q) g -1 , g -1 ], regardless of the adversary’s actions. 
The lemma follows. □ 


3.4 Lattice Parameters 

It is not so obvious to see that the various parameters can be instantiated in a way 
that satisfies the flurry of constraints and inequalities evoked in §[5]and § 13.31 This 
is necessary for us, later, to prove the security of the signature from a polynomial 
average-case SIS that reduces to a worst-case lattice hardness assumption. 
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Example 28. To ensure that hard lattices with good short bases can be generated 
(i.e., rri > 6 n logg), that our flavor of SIS has a worst-case lattice reduction 
(he., q > (3 ■ u(VnTogn)), that the two-sided trapdoors can operate smooothly 
(i.e., a sufficiently large), that vectors samples using a trapdoor are difficult SIS 
solutions (i.e., [3 > '/2Jmpcr) 1 etc., in function of a security parameter A, we 
may choose a function log m), a constant di > 0, and a threshold Ao 0; 
and VA > Ao we set: 

n = A 


r] = w(v / logm) 

L = log m) 

a = Vim 3 / 2 to ( \j log iji ) 4 
/3 = V2£m 5 ^ 2 uj(^/\ogm) 5 
q = V2£m 3 uj(\/logm) 6 

One must however keep in mind that the security reduction given in Theorem 1251 
holds only if q > 2 Q, so it may be necessary to increase q and the other param¬ 
eters beyond the baseline values listed above. We avoid this in 5 13.51 


3.5 Refined Simulation Framework 

In the full version, we give a refined analysis of the scheme that lets us keep 
the baseline q even for very large Q. The idea is to replace the random scalars 
hi £ Z 9 by block-diagonal matrices Hi £ Z)) xn consisting of a repeated random 
submatrix drawn from a full-rank difference group Q C Z kxk for a special k\n, 
where any difference G\ — G2 £ Q is either zero or an invertible matrix in Z^ xfc . 
Visually, a random input fc-vector £ Z() is mapped to a random matrix Hi using 
an encoding map p built from an FRD encoding ip, according to this picture, 


p : Z k —> Z™ xn 

: v 1 —> 

( V?(v) 

^(v) 

0 1 



l 0 

^(v) / 


Full-rank difference (FRD) families in Z™ x " were used as a plentiful IBE encod¬ 
ing m able to represent as many as possible, up to q n , distinct identities. 

Here, FRD families will serve in a very different way, internal to the simulator, 
to turn the mixing coefficients Ip into uniformly drawn matrices H, from a do¬ 
main whose size q k is just right in function of the number Q of queries, without 
worrying about the modulus q. The benefit is that smaller moduli makes signa¬ 
tures smaller and faster, and security tighter. Remarkably, except for relaxing q, 
the actual scheme is unchanged. The theorem below is proven in the full paper. 
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Theorem 29. If there exists a probabilistic algorithm A that creates an existen¬ 
tial signature forgery, in time t, with probability e, making Q adaptive chosen- 
message queries, then there exists a probabilistic algorithm B that solves the SIS 
problem of Theorem\^S\ in time t' ss r with probability e' > e/(6 qQ). 

Since both theorems apply to the same scheme in our framework, we can pick q 
obliviously of Q, and invoke Theorem 1251 if Q < q/ 2 or Theorem flOl if Q 3> q. 

Acknowledgments. The author thanks Shweta Agrawal, Dan Boneh, Ronald 
Cramer, David Freeman, and anonymous referees for valuable insights. 
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